tensorflow changelog


Here's a delightful rundown of all the awesome changes that have been made recently. Get ready to dive into some cool new features and improvements! ๐Ÿš€

New Features

  • Advanced Profiler Configuration: We've jazzed up the profiler with an advanced configuration option. Now, you can specify various settings with greater flexibility, like a pro! ๐ŸŽ›๏ธ
  • GPU Environment via C API: Say goodbye to singleton headaches! Access the GPU environment with our shiny new C API LiteRtGpuGlobalEnvironmentCreate(). It's all about smoother GPU operations now! ๐Ÿ–ฅ๏ธ
  • TfrtGpuBuffer Debut: Introducing the TfrtGpuBuffer! This is the first step in supercharging GPU support within the XLA framework. Let's get that GPU party started! ๐ŸŽ‰
  • Inlineable Attribute: The inlineable attribute is now a first-class citizen, giving you more control over which call operations get inlined. More power to you! ๐Ÿ’ช
  • CreateErrorBuffer Functionality: Meet CreateErrorBuffer, your new best friend for error handling in GPU operations. It keeps things running smoothly even when errors pop up. ๐Ÿ› ๏ธ

Improvements

  • Dynamic & Static GPU Accelerator Support: Whether you're dynamically or statically linking your GPU accelerators, we've got you covered. Flexibility at its finest! ๐Ÿ”—
  • More Op Builders: We've added more operation builders, especially for the ResizeNearestNeighbor operation. Your models just got a makeover! ๐Ÿ—๏ธ
  • IFRT Arrays Layout Management: Layouts are now better managed with IFRT Arrays, thanks to some nifty tweaks and a roll-forward fix. It's all about keeping things neat and tidy! ๐Ÿ“

Bug Fixes

  • Shared Library Path Fix in QNN: No more wandering paths! We've fixed the library path issues in QNN, ensuring your shared libraries are right where they need to be. ๐Ÿ› ๏ธ
  • Layout Creation from Proto: Crashes are so yesterday. Now, Layout::CreateFromProto() handles invalid inputs gracefully, keeping your app running smoothly. ๐Ÿšซ๐Ÿ’ฅ
  • GPU Model Execution Fixes: We've squashed bugs causing GPU model execution failures, including layout mishaps and memory leaks. Your GPU tasks just got a whole lot smoother! ๐Ÿ›๐Ÿ”ง

Chore

  • Automated Code Cleanup: A little spring cleaning never hurt anybody! We've removed unnecessary header files to keep the codebase lean and mean. ๐Ÿงน

Keep exploring these updates and enjoy the enhanced TensorFlow experience! Happy coding! ๐Ÿ˜ƒโœจ

Included Commits

2025-03-14T01:19:41 See commit

The recent commit modifies the Layout::CreateFromProto() function to handle invalid input more gracefully, preventing crashes when an invalid proto is encountered. Instead of terminating the program, the function now returns an invalid Layout object, which can be identified during subsequent validation processes. This change is particularly useful in scenarios such as fuzz testing, where intentionally malformed protos may be used.

Additionally, the commit introduces a new error handling mechanism within the Layout class, allowing for different actions to be taken when encountering errors, such as logging a warning or a fatal error. The set_tail_padding_alignment_in_elements method has been updated to incorporate this mechanism, ensuring that any invalid values for tail padding alignment are appropriately logged without causing the application to crash. Overall, these changes enhance the robustness of the layout handling by allowing for better error management and validation.

Files changed

  • third_party/xla/xla/layout.cc
  • third_party/xla/xla/layout.h
2025-03-14T01:22:49 See commit

The commit introduces the initial version of the TfrtGpuBuffer within the XLA (Accelerated Linear Algebra) project, specifically for GPU support. The changes include modifications to several files, notably the tfrt_gpu_client.cc and tfrt_gpu_client.h, which likely involve updates to the GPU client implementation to accommodate the new buffer functionality.

Additionally, a new test file, tfrt_gpu_buffer_test.cc, has been added to ensure the proper functionality and reliability of the TfrtGpuBuffer. The changes are encapsulated in the BUILD file, indicating adjustments to the project's build configuration to integrate the new components effectively. This commit marks a significant step in enhancing GPU support within the XLA framework.

Files changed

  • third_party/xla/xla/pjrt/gpu/tfrt/BUILD
  • third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_buffer_test.cc
  • third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.cc
  • third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.h
2025-03-14T04:31:19 See commit

This commit introduces the integration of layout management through the implementation of IFRT (Intermediary Format Runtime) Arrays, while also addressing a previous issue by reverting a specific commit identified by its hash. The changes span multiple files within the XLA (Accelerated Linear Algebra) Python interface, indicating a significant update to the array implementation and related functionalities.

The modifications include updates to array implementation tests, sharding logic, and various components of the PJRT (Platform-agnostic JIT Runtime) interface, reflecting a comprehensive effort to enhance the framework's capabilities. By rolling forward with this fix, the commit aims to improve the overall performance and usability of the XLA library, ensuring better layout handling within the IFRT Arrays.

Files changed

  • third_party/xla/xla/python/ifrt/BUILD
  • third_party/xla/xla/python/ifrt/array_impl_test_lib.cc
  • third_party/xla/xla/python/ifrt/sharding.cc
  • third_party/xla/xla/python/pjrt_ifrt/pjrt_array.cc
  • third_party/xla/xla/python/pjrt_ifrt/pjrt_array.h
  • third_party/xla/xla/python/pjrt_ifrt/pjrt_client.cc
  • third_party/xla/xla/python/pjrt_ifrt/pjrt_executable.cc
  • third_party/xla/xla/python/pjrt_ifrt/pjrt_remap.cc
  • third_party/xla/xla/python/py_array.cc
  • third_party/xla/xla/python/to_ifrt_sharding.h
  • third_party/xla/xla/python/transfer/py_socket_transfer.cc
2025-03-14T17:42:09 See commit

This commit introduces significant changes to the LiteRT framework by allowing access to the GPU environment via a new C API, specifically through the function LiteRtGpuGlobalEnvironmentCreate(). This change addresses the issue of multiple singleton instances being created through direct linking with dynamic accelerator libraries, which could lead to conflicts and inefficiencies. The commit includes renaming of certain components for clarity, such as changing runtime/environment to runtime/gpu_environment and EnvironmentSingleton to GpuEnvironmentSingleton, reflecting the focus on GPU operations.

In addition to the new API function, the commit also modifies several source files to incorporate the new GPU environment structure. This includes updates to header files and test cases to ensure the new singleton pattern is correctly implemented and tested. By organizing the GPU-related functionalities under a dedicated environment, the changes aim to streamline interactions with GPU resources, enhancing the overall efficiency and usability of the LiteRT runtime for GPU-accelerated tasks.

Files changed

  • tensorflow/lite/experimental/litert/c/BUILD
  • tensorflow/lite/experimental/litert/c/litert_environment.cc
  • tensorflow/lite/experimental/litert/c/litert_environment.h
  • tensorflow/lite/experimental/litert/runtime/BUILD
  • tensorflow/lite/experimental/litert/runtime/gpu_environment.cc
  • tensorflow/lite/experimental/litert/runtime/gpu_environment.h
  • tensorflow/lite/experimental/litert/runtime/gpu_environment_test.cc
  • tensorflow/lite/experimental/litert/runtime/open_cl_buffer.cc
2025-03-14T20:55:40 See commit

This commit introduces support for both dynamically and statically linked GPU accelerators within the TensorFlow Lite runtime environment. A weak function named RegisterStaticLinkedAcceleratorGpu is defined, allowing it to be overridden when the GPU accelerator is linked statically. This enhancement ensures that the registration of the GPU accelerator can be handled appropriately based on the linking method used, thus improving flexibility in the deployment of GPU resources.

In addition to the function definition, the commit modifies several files to integrate this new functionality, including updates to the build configuration and test cases. The changes also include the addition of necessary dependencies and logging mechanisms to confirm the successful registration of the statically linked GPU accelerator. Overall, these modifications aim to enhance the efficiency and adaptability of TensorFlow Lite in leveraging GPU acceleration.

Files changed

  • tensorflow/lite/experimental/litert/cc/BUILD
  • tensorflow/lite/experimental/litert/runtime/accelerators/BUILD
  • tensorflow/lite/experimental/litert/runtime/accelerators/auto_registration.cc
2025-03-16T23:50:41 See commit

This commit introduces a new function, CreateErrorBuffer, to the TfrtGpuClient class, which is part of the XLA (Accelerated Linear Algebra) library for GPU operations. The CreateErrorBuffer function is designed to create a dummy buffer that can be used when an error occurs during computation, ensuring that the rest of the processing pipeline can continue to expect a buffer, even in the presence of errors. This function checks the validity of the memory space associated with the error and logs relevant information for debugging purposes. Additionally, the commit modifies the TfrtGpuClient class to include a destructor and methods for looking up devices and getting default device assignments.

Furthermore, the commit includes a comprehensive unit test for the Compile() function to validate the behavior of the newly added CreateErrorBuffer method. The test checks whether the error propagation works as intended by attempting to compile an executable with an input error, ensuring that the error is correctly returned when executing the computation. This addition enhances the robustness and error-handling capabilities of the TfrtGpuClient, making it better suited for real-world applications where errors are likely to occur.

Files changed

  • third_party/xla/xla/pjrt/gpu/tfrt/BUILD
  • third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.cc
  • third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.h
  • third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client_test.cc
2025-03-19T05:49:46 See commit

This commit involves an automated code change that modifies two files within the TensorFlow compiler codebase: gather_op.cc and unique_op.cc. In both files, the change consists of the removal of the inclusion of the header file absl/types/optional.h, which was deemed unnecessary. Consequently, each file has one line deleted, reflecting this adjustment.

The overall impact of this commit is a slight reduction in code complexity by eliminating an unused dependency, which can lead to improved readability and potentially better compilation times. Such clean-up efforts are essential for maintaining the quality and efficiency of the codebase, ensuring that only relevant dependencies are included in the project.

Files changed

  • tensorflow/compiler/tf2xla/kernels/gather_op.cc
  • tensorflow/compiler/tf2xla/kernels/unique_op.cc
2025-03-19T21:37:43 See commit

This commit addresses multiple issues related to GPU model execution failures in TensorFlow Lite. Firstly, it resolves a failure with the MLD (Model Layer Delegate) when handling a cl tensor with the layout set to BHWC and a batch size of 1, which should instead be configured to HWC layout. Secondly, it fixes a problem where the MLD delegate fails to find the output tensor in the buffer context when the graph is partially delegated. Lastly, the commit corrects a memory leak that occurs during the creation of the GPU global environment.

The modifications include changes to the litert_environment.cc file, specifically updating how the GPU environment singleton is created to prevent the memory leak. The patch shows a minor adjustment in the code, replacing the release of the environment with a get method to properly manage memory. Overall, these changes enhance the stability and efficiency of GPU model execution in TensorFlow Lite.

Files changed

  • tensorflow/lite/experimental/litert/c/litert_environment.cc
2025-03-19T23:01:32 See commit

The commit associated with PR #23947 introduces a new first-class attribute called inlineable to enhance the control over inlining call operations in the XLA (Accelerated Linear Algebra) compiler framework. Previously, the InlineStreamAnnotation mechanism was tied to GPU-specific implementations and relied on unnecessary flags. By creating the inlineable attribute, the frontend can now specify directly which call operations should be inlined, streamlining the process and removing dependencies on specific backend configurations.

The implementation includes modifications to the call inliner logic, allowing it to check for the inlineable attribute in each instruction's frontend attributes. Additionally, unit tests were added to ensure that the new attribute functions correctly, particularly verifying that calls marked with inlineable="false" are not inlined. This change not only simplifies the inlining process but also enhances the flexibility of the XLA framework by allowing developers to have more granular control over inlining behavior in their computations.

Files changed

  • third_party/xla/xla/service/call_inliner.cc
  • third_party/xla/xla/service/call_inliner_test.cc
2025-03-20T07:57:13 See commit

This commit introduces an advanced configuration option for the profiler within the TensorFlow library. The changes primarily involve modifications to the profiler_options.proto file, where a new message type, AdvancedConfigValue, is defined. This message can store configuration values of different types, including string, boolean, and integer, allowing for greater flexibility in how configurations are passed to the profiler.

Additionally, a new map, advanced_configuration, is added to the ProfileOptions message, enabling users to specify various configuration settings by associating keys with their corresponding values. This enhancement aims to improve the usability of the profiler by allowing developers to tailor settings based on specific requirements, such as specifying a TPU trace mode. Overall, these changes enhance the configurability of the profiler, making it more adaptable to diverse profiling scenarios.

Files changed

  • third_party/xla/third_party/tsl/tsl/profiler/protobuf/profiler_options.proto
2025-03-20T20:32:48 See commit

This commit addresses issues related to the path configuration for shared libraries in the QNN (Qualcomm Neural Network) module of TensorFlow Lite. It corrects the default behavior where the shared_library_dir is not explicitly set by the user, leading it to default to an incorrect location derived from dispatch_library_dir. The solution involves appending the correct library path to the LD_LIBRARY_PATH environment variable and dynamically loading the shared libraries by their names. The changes include modifications to several source files to implement this functionality, ensuring that the appropriate library paths are utilized during runtime.

Key updates include the introduction of a new function, PutLibOnLdPath, which searches for the specified shared library and appends its parent directory to the LD_LIBRARY_PATH if it's not already included. Additionally, the commit enhances the QNN manager's initialization method to utilize this new function, ensuring that the correct paths are set for loading necessary libraries. The commit also includes various tests to verify the correct behavior of the new path handling logic, ensuring that the system behaves as expected under different scenarios.

Files changed

  • tensorflow/lite/experimental/litert/core/BUILD
  • tensorflow/lite/experimental/litert/core/dynamic_loading.cc
  • tensorflow/lite/experimental/litert/core/dynamic_loading.h
  • tensorflow/lite/experimental/litert/core/dynamic_loading_test.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/BUILD
  • tensorflow/lite/experimental/litert/vendors/qualcomm/qnn_manager.cc
2025-03-21T02:41:14 See commit

This commit introduces additional operation builders for TensorFlow Lite's LiteRT framework, specifically enhancing the functionality for the ResizeNearestNeighbor operation. Two new functions, LiteRtGetResizeNearestNeighborAlignCornersOption and LiteRtGetResizeNearestNeighborHalfPixelCenterOption, have been added to retrieve options related to the alignment of corners and the use of half-pixel centers. These functions validate the operation type and extract the relevant options from the operation's internal settings. Additionally, unit tests have been created to ensure these options are correctly implemented and return expected values.

The commit also includes modifications to various files to integrate these new builders into the existing framework. This includes updates to the operation conversion logic, ensuring that the new options are considered when building the ResizeNearestNeighbor operation. Furthermore, the commit adds new test data and updates existing libraries to support the new functionality, including the addition of ReLU operation builders and enhancements to element-wise operations such as minimum and maximum. Overall, this commit significantly expands the capabilities of the LiteRT operation builders, improving the flexibility and performance of TensorFlow Lite's model execution.

Files changed

  • tensorflow/lite/experimental/litert/c/litert_options.cc
  • tensorflow/lite/experimental/litert/c/litert_options.h
  • tensorflow/lite/experimental/litert/c/litert_options_test.cc
  • tensorflow/lite/experimental/litert/test/testdata/simple_relu_op.mlir
  • tensorflow/lite/experimental/litert/test/testdata/simple_resize_nearest_neighbor_op.mlir
  • tensorflow/lite/experimental/litert/tools/dump.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/BUILD
  • tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/qnn_compiler_plugin_test.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/qnn_compose_graph.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/BUILD
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/elementwise_op_builder.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/elementwise_op_builder.h
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/quantize_op_builder.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/quantize_op_builder.h
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/relu_op_builder.cc
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/builders/relu_op_builder.h
  • tensorflow/lite/experimental/litert/vendors/qualcomm/core/wrappers/tensor_wrapper.h