tensorflow changelog

1 year ago

Here's a fresh batch of updates for you, packed with new features, improvements, and bug fixes. Let's dive in! 🚀

New Feature: LiteRT GPU Accelerator
The ml_drift_cl_litert feature has been unleashed, enhancing TensorBuffer integration via the DelegateKernelLiteRt. This includes publishing TensorBufferRequirements in kLiteRtTensorBufferTypeOpenCl, binding TensorBuffers with BindTensorBuffers(), and a simplified Invoke() implementation. The TensorFlow Lite experimental LiteRT codebase got some love too, with updates ensuring OpenCL is recognized as the buffer type for input and output tensors.
New Feature: XLA TopK Operation Semantics
Added a detailed section in the XLA docs about the TopK operation, explaining how it identifies the largest or smallest elements in a tensor. Whether you're dealing with one-dimensional arrays or multi-dimensional tensors, this update has got your back!
Improvement: Unary Functions in XLA
Enhanced the XLA builder by adding ResultAccuracy support for unary functions like Cbrt, Cos, Erf, and more. This comprehensive update spans multiple files to boost precision and reliability across the TensorFlow ecosystem.
New Feature: chlo.ragged_dot CAPI and Python API
Say hello to the new CAPI and Python API for chlo.ragged_dot in the StableHLO framework. This includes a new RaggedDotDimensionNumbers attribute, allowing users to specify dimension configurations for matrix operations. Python bindings and test cases have been updated to ensure everything runs smoothly.
Improvement: cuDNN Fusion Compiler
The cuDNN fusion compiler now processes graphs with assigned workspaces, optimizing High-Level Operations (HLO) for better GPU performance. This update includes test cleanups and improved resource management.
New Feature: TfrtGpuBuffer
Introducing the TfrtGpuBuffer for GPU support in XLA. This initial version includes updates to the GPU client implementation and a new test file to ensure everything's running like a well-oiled machine.
New Feature: SmallWhileLoopHoistingPass
A new optimization pass for the XLA CPU backend, SmallWhileLoopHoistingPass, improves small while loop performance by hoisting them into callable computations. This update includes unit tests and refinements to cost analysis.
Improvement: Dynamic Test Case Generation
Dynamic test case generation for TensorFlow Lite's compiled models is here! This feature creates C++ test cases on-the-fly, adapting to different environments and consolidating testing into a single binary.
Bugfix: litert::Expected Assignment Operators
Fixed a critical bug in the litert::Expected class assignment operators, ensuring proper handling of different value states and preventing data corruption.
Bugfix: HloRunner Thread Safety
Enhanced the thread safety of the HloRunner class by removing race conditions and introducing a mutex for safe resource management.
Bugfix: Model Round-Tripping
Ensured buffers initially appended to the FlatBuffer remain correctly appended during serialization and deserialization in TensorFlow Lite's LiteRT.
Chore: NCCL References Removed
Cleaned up the XLA GPU backend by removing NCCL references from CollectiveBroadcast and CollectivePermute functionalities, streamlining the codebase for better flexibility and performance.

Stay tuned for more updates, and happy coding! 😄✨

Included Commits

2025-03-07T06:10:10 See commit

The commit introduces the ml_drift_cl_litert feature for the LiteRT GPU Accelerator, enhancing TensorBuffer integration through the addition of the DelegateKernelLiteRt. Key changes include the publication of TensorBufferRequirements under kLiteRtTensorBufferTypeOpenCl, the implementation of the BindTensorBuffers() function for binding TensorBuffers, and a streamlined version of the Invoke() method.

Additionally, modifications were made to the TensorFlow Lite experimental LiteRT codebase, including updates to the BUILD file and test cases in litert_compiled_model_gpu_test.cc. These updates ensure that the code correctly identifies and utilizes OpenCL as the buffer type for input and output tensors, thereby improving the overall functionality and testing of the GPU-accelerated models.

Files changed

tensorflow/lite/experimental/litert/cc/BUILD
tensorflow/lite/experimental/litert/cc/litert_compiled_model_gpu_test.cc

2025-03-07T21:38:01 See commit

The commit associated with PR #23414 introduces a significant enhancement to the cuDNN fusion compiler by enabling it to process graphs with assigned workspaces. This improvement allows for the execution of optimized High-Level Operations (HLO), which can lead to better performance in GPU computations. The changes include modifications to various files within the XLA (Accelerated Linear Algebra) codebase, particularly focusing on the GPU backend, and involve updates to tests that validate the functionality of the new feature.

In addition to the core functionality, the commit also includes cleanup efforts in the testing framework and enhances the handling of workspace allocations within the cuDNN fusion process. It ensures that when a workspace is required for a fused operation, it is appropriately allocated and utilized, thereby optimizing resource management during execution. The overall goal of these changes is to improve the efficiency and capability of the XLA framework in leveraging GPU resources for complex computations.

Files changed

third_party/xla/xla/backends/gpu/codegen/BUILD
third_party/xla/xla/backends/gpu/codegen/cudnn_test.cc
third_party/xla/xla/service/gpu/transforms/cudnn_fusion_compiler.cc

2025-03-07T22:05:16 See commit

This commit introduces a new section in the XLA (Accelerated Linear Algebra) documentation specifically detailing the semantics of the TopK operation. The TopK operation is designed to identify the k largest or smallest elements from the last dimension of a given tensor. The documentation outlines the function's arguments, specifying that operand is the tensor to be analyzed, k indicates the number of elements to extract, and largest is a boolean that determines whether to retrieve the largest or smallest values.

The commit elaborates on how the TopK operation works for tensors of different ranks. For a one-dimensional tensor (array), it returns a tuple containing the k largest or smallest values along with their corresponding indices. For higher-dimensional tensors, the operation computes the top k entries while preserving the structure of the other dimensions. The documentation also specifies that in cases of equal elements, the one with the lower index is prioritized in the output. Overall, this addition enhances the clarity and usability of the XLA library by providing comprehensive details about the TopK operation.

Files changed

third_party/xla/docs/operation_semantics.md

2025-03-07T22:19:41 See commit

This commit introduces the definition of the CAPI and Python API for the chlo.ragged_dot operation within the StableHLO framework. The changes include a substantial addition of 352 lines to the relevant files, particularly focusing on creating the RaggedDotDimensionNumbers attribute. This new attribute allows users to specify various dimension configurations such as batching, contracting, and ragged dimensions for both left-hand side (LHS) and right-hand side (RHS) matrices. The commit also adds functions to retrieve these dimensions and checks for attribute types, enhancing the API's functionality.

Additionally, the commit includes updates to the Python bindings, enabling users to create and manipulate RaggedDotDimensionNumbers attributes directly from Python. Test cases have also been added to validate the new functionality, ensuring that the attributes can be correctly instantiated and accessed. Overall, this commit significantly enhances the capabilities of the StableHLO framework, facilitating operations involving ragged tensors in both C and Python environments.

Files changed

third_party/stablehlo/temporary.patch

2025-03-07T23:02:18 See commit

This commit enhances the XLA (Accelerated Linear Algebra) builder by adding ResultAccuracy support for several remaining unary functions, including Cbrt, Cos, Erf, Expm1, Log, Log1p, Logistic, Rsqrt, Sin, Sqrt, Tan, and Tanh. The modifications span multiple files, indicating a comprehensive update to ensure that these unary operations can now provide accurate results within the XLA framework.

The changes were made across various components, including the unary operations composition file, testing frameworks, and Python bindings. This update is aimed at improving the precision and reliability of mathematical operations in the TensorFlow ecosystem, which relies on XLA for optimized performance. The commit reflects ongoing efforts to enhance the functionality and robustness of the XLA library.

Files changed

tensorflow/compiler/tf2xla/kernels/unary_ops_composition.cc
third_party/xla/xla/hlo/builder/lib/math_test.cc
third_party/xla/xla/hlo/builder/xla_builder.cc
third_party/xla/xla/hlo/builder/xla_builder.h
third_party/xla/xla/hlo/builder/xla_builder_test.cc
third_party/xla/xla/python/ops.cc
third_party/xla/xla/python/xla_client_test.py
third_party/xla/xla/python/xla_extension/ops.pyi
third_party/xla/xla/tests/BUILD
third_party/xla/xla/tests/complex_unary_op_test.cc
third_party/xla/xla/tests/exhaustive/exhaustive_unary_complex_test.cc
third_party/xla/xla/tests/exhaustive/exhaustive_unary_test_ops.inc
third_party/xla/xla/tests/half_test.cc

2025-03-11T01:35:17 See commit

The recent commit enhances the thread safety of the HloRunner class within the XLA (Accelerated Linear Algebra) library. This modification addresses issues identified in previously added tests that failed under Thread Sanitizer (tsan) due to race conditions. The update involves removing the entry_computation_layout_ field and instead passing the entry_computation_layout as a parameter to relevant functions. This change ensures that the layout is only updated once per module, preventing concurrent modifications or reads during updates. Additionally, a mutex has been introduced to manage access to shared resources safely.

Other notable adjustments include the removal of unnecessary initialization of the backend_ variable within the backend() method, as it is now guaranteed to be set during the constructor. The changes contribute to a more robust and reliable execution environment for the HloRunner, allowing it to handle concurrent operations without risking data integrity. Overall, these enhancements improve the stability and performance of the XLA library in multi-threaded scenarios.

Files changed

third_party/xla/xla/service/BUILD
third_party/xla/xla/service/hlo_runner.cc
third_party/xla/xla/service/hlo_runner.h

2025-03-11T20:33:18 See commit

This commit addresses a critical bug in the assignment operators of the litert::Expected class, which could lead to undefined behavior when copying or moving an Expected instance that holds different value states (i.e., one holding a value and the other holding an error). The issue arose because the destructor was called prematurely, leading to an attempt to assign to uninitialized memory, violating the precondition that an instance must exist in a valid state before assignment.

The modifications made in this commit ensure that the assignment operators handle the different states correctly by checking whether the source and destination objects hold values or errors. The code now properly constructs or destructs the appropriate members based on their states, thereby maintaining the integrity of the Expected class and preventing potential crashes or data corruption. Overall, this fix enhances the robustness of the litert library's error handling mechanism.

Files changed

tensorflow/lite/experimental/litert/cc/litert_expected.h

2025-03-11T21:06:46 See commit

The recent commit addresses issues related to model round-tripping in TensorFlow Lite's experimental LiteRT by ensuring that buffers initially appended to the FlatBuffer remain correctly appended during serialization and deserialization processes. Modifications were made in several files, including model_file_test.cc, model_load.cc, and model_serialize.cc, to enhance the handling of buffer contexts and ensure that the buffers are appropriately flagged as external when necessary.

Key changes include the introduction of a new structure, TflBufferContext, which encapsulates buffer information alongside a boolean indicating whether the buffer is external. The ReadBuffer function was updated to return this new structure, while serialization routines were adjusted to use this context effectively, ensuring that the integrity of buffer states is maintained throughout the model's lifecycle. These enhancements aim to improve the reliability and consistency of model loading and serialization in TensorFlow Lite.

Files changed

tensorflow/lite/experimental/litert/core/model/model_file_test.cc
tensorflow/lite/experimental/litert/core/model/model_load.cc
tensorflow/lite/experimental/litert/core/model/model_serialize.cc

2025-03-12T13:40:27 See commit

This commit introduces a new optimization pass named SmallWhileLoopHoistingPass to the XLA (Accelerated Linear Algebra) CPU backend. The pass is designed to improve the performance of small while loops by hoisting them into callable computations, which can enhance execution efficiency. The implementation includes a depth-first search (DFS) mechanism to determine if a while loop contains any "unavailable" instructions that would prevent it from being hoisted. If the loop is deemed small enough based on a specified buffer access size, it is transformed into a call instruction that references an embedded computation, thereby optimizing its execution.

Additionally, the commit adds corresponding unit tests to verify the functionality of the new pass. These tests check various scenarios, including successful hoisting of small while loops and ensuring that larger or unsupported loops are not incorrectly transformed. Modifications to existing files related to cost analysis are also included, refining how instructions are managed within the analysis framework. Overall, this commit enhances the efficiency of loop handling in the XLA CPU backend by introducing a targeted optimization for small while loops.

Files changed

third_party/xla/xla/service/cpu/BUILD
third_party/xla/xla/service/cpu/small_while_loop_hoisting_pass.cc
third_party/xla/xla/service/cpu/small_while_loop_hoisting_pass_test.cc
third_party/xla/xla/service/hlo_cost_analysis.cc
third_party/xla/xla/service/hlo_cost_analysis.h

2025-03-13T21:39:15 See commit

The recent commit focuses on removing references to NCCL (NVIDIA Collective Communications Library) from the CollectiveBroadcast and CollectivePermute functionalities within the XLA GPU backend. This change involves renaming several files related to these collective operations, specifically collective_broadcast_thunk.cc, collective_broadcast_thunk.h, collective_permute_thunk.cc, and collective_permute_thunk.h, indicating a structural update in the codebase to eliminate dependencies on NCCL.

Additionally, modifications were made to other related files, such as command_buffer_cmd.cc, and updates were applied to the BUILD files in both the GPU runtime and service directories. This commit reflects an effort to streamline the GPU backend by decoupling it from NCCL, potentially enhancing its flexibility and performance in collective operations.

Files changed

third_party/xla/xla/backends/gpu/runtime/BUILD
third_party/xla/xla/backends/gpu/runtime/collective_broadcast_thunk.cc
third_party/xla/xla/backends/gpu/runtime/collective_broadcast_thunk.h
third_party/xla/xla/backends/gpu/runtime/collective_permute_thunk.cc
third_party/xla/xla/backends/gpu/runtime/collective_permute_thunk.h
third_party/xla/xla/backends/gpu/runtime/command_buffer_cmd.cc
third_party/xla/xla/service/gpu/BUILD
third_party/xla/xla/service/gpu/ir_emitter_unnested.cc

2025-03-13T23:03:47 See commit

The recent commit introduces a dynamic test case generation feature for integration tests within the TensorFlow Lite framework, specifically for compiled models. This enhancement allows for the creation of C++ test cases on-the-fly based on a specified configuration, which includes one or more TensorFlow Lite models and various accelerator settings (such as NPU or CPU). The implementation is designed to be adaptable to different environments, such as Linux and Android, using GTEST_SKIP to manage environment-specific differences. The goal is to consolidate the testing of all compiled model functionalities—like Just-In-Time (JIT) compilation and invocation on different hardware accelerators—into a single C++ binary.

The commit adds new source files and modifies existing ones to facilitate the generation of test cases that dynamically adapt based on the provided model paths and hardware configurations. It includes a new gen_device_test binary that utilizes Google Test (GTEST) for executing the tests. The code structure allows for easy integration of multiple models and supports logging and environment setup. Overall, this enhancement aims to streamline the testing process for TensorFlow Lite's compiled models, ensuring comprehensive coverage and flexibility in testing across various platforms.

Files changed

tensorflow/lite/experimental/litert/integration_test/BUILD
tensorflow/lite/experimental/litert/integration_test/gen_device_test.cc
tensorflow/lite/experimental/litert/integration_test/gen_device_test_lib.h
tensorflow/lite/experimental/litert/test/common.cc
tensorflow/lite/experimental/litert/test/common.h

2025-03-14T01:22:49 See commit

The commit introduces the initial version of the TfrtGpuBuffer within the XLA (Accelerated Linear Algebra) project, specifically for GPU support. The changes include modifications to several files, notably the tfrt_gpu_client.cc and tfrt_gpu_client.h, which likely involve updates to the GPU client implementation to accommodate the new buffer functionality.

Additionally, a new test file, tfrt_gpu_buffer_test.cc, has been added to ensure the proper functionality and reliability of the TfrtGpuBuffer. The changes are encapsulated in the BUILD file, indicating adjustments to the project's build configuration to integrate the new components effectively. This commit marks a significant step in enhancing GPU support within the XLA framework.

Files changed

third_party/xla/xla/pjrt/gpu/tfrt/BUILD
third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_buffer_test.cc
third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.cc
third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.h