tensorflow changelog

9 months ago

Here's the latest scoop on the updates and enhancements we've made recently. We've been busy bees, buzzing around to optimize, enhance, and squash those pesky bugs. Let's dive into the details! 🐝

New feature: We’ve rolled out the DynamicSliceCopyFusionCmd in the XLA GPU backend to make memory operations smoother than a fresh jar of Skippy! This command, derived from DynamicMemcpyThunk, now allows for efficient slice copying with static offsets. 🥜
Improvement: The RedzoneBuffers got a turbo boost! Now, you can create these buffers from an Executable, making memory management in GPU computations as easy as pie. 🍰
New feature: Say hello to the xla::PjRtPhaseCompiler! This addition is all about enhancing the XLA framework with a sprinkle of phase compilation magic. ✨
Improvement: We've supercharged the DotLibraryRewriter in the XLA CPU backend to support oneDNN fusions for dot operations with element-wise ops like Add, Mul, and Exp. It's like giving your CPU a shot of espresso! ☕️
New feature: Introducing PjRtFuture::Map! This nifty API lets you transform future results with ease, ensuring error propagation is handled like a pro. 🚀
New feature: The CreateRawAliasOfBuffer() method is now live, allowing for more efficient memory management in the XLA framework. It's all about sharing the love... and the buffers! 💾
New feature: With the TryMap method, you can now map futures with a functor that might fail, handling errors gracefully and keeping things running smoothly. 🎩
Improvement: Mesh deduplication just got a whole lot cooler with support for mapping a single sub-axis to multiple sub-axes. It's like a party for your axes! 🎉
Bugfix: We've squashed a bug in BufferFromHostLiteral to ensure events are fulfilled even when allocation fails. No more unhandled errors raining on your parade! ☔️
Bugfix: Crashes caused by unsupported operand types in XNNPACK have been fixed. Now, we check operands to keep things crash-free and smooth sailing! 🛳️
Chore: A little housekeeping with the forward compatibility horizon updated to June 28, 2025. Just keeping things fresh and up-to-date! 📅
Bugfix: We rolled back a previous change due to suspected correctness issues. Sometimes you gotta take a step back to move forward! 🔄

That's all for now, folks! Keep those GPUs buzzing and CPUs humming. Until next time! 🎶

Included Commits

2025-07-01T02:41:03 See commit

The commit introduces a new method, CreateRawAliasOfBuffer(), to the CommonPjRtBufferImpl class within the XLA (Accelerated Linear Algebra) library. This method is designed to create a raw alias of a buffer, facilitating the management and sharing of device memory in a more efficient manner. The implementation ensures that the buffer is acquired safely using a scoped lock, which helps maintain thread safety during operations. Additionally, the commit modifies several header and source files to incorporate this new functionality, including updates to the CommonPjRtBuffer class and related files.

In conjunction with CreateRawAliasOfBuffer(), the commit also adds a method called GetBufferWithHold() to acquire a buffer with a specific hold type, enhancing the buffer management capabilities. This is particularly useful for managing external references to the buffer while ensuring that the necessary locks are in place to prevent concurrent access issues. Overall, these changes are aimed at improving the robustness and efficiency of buffer handling in the XLA framework, particularly in scenarios involving raw memory access and device memory management.

Files changed

third_party/xla/xla/pjrt/abstract_tracked_device_buffer.cc
third_party/xla/xla/pjrt/abstract_tracked_device_buffer.h
third_party/xla/xla/pjrt/common_pjrt_client.cc
third_party/xla/xla/pjrt/common_pjrt_client.h
third_party/xla/xla/pjrt/cpu/raw_buffer.h
third_party/xla/xla/pjrt/raw_buffer.h

2025-07-01T03:39:46 See commit

The commit introduces a new command buffer command, DynamicSliceCopyFusionCmd, to the XLA GPU backend, which is designed to facilitate the copying of slices from one buffer to another. This command is derived from the existing DynamicMemcpyThunk and currently only supports static offsets for the source and destination slices. The implementation includes the necessary methods for initialization, preparation, and recording of commands, ensuring that memory operations are handled efficiently within the GPU's command buffer.

Additionally, the commit includes updates to various files, such as command_buffer_cmd.cc and command_buffer_cmd.h, to integrate this new command into the command buffer architecture. Tests have also been added to validate the functionality of DynamicSliceCopyFusionCmd, confirming that it correctly performs memory copies as intended. Overall, this addition enhances the capabilities of the XLA GPU backend, allowing for more optimized memory operations in GPU computations.

Files changed

third_party/xla/xla/backends/gpu/runtime/BUILD
third_party/xla/xla/backends/gpu/runtime/command_buffer_cmd.cc
third_party/xla/xla/backends/gpu/runtime/command_buffer_cmd.h
third_party/xla/xla/backends/gpu/runtime/command_buffer_cmd_test.cc
third_party/xla/xla/service/gpu/transforms/command_buffer_scheduling_test.cc

2025-07-02T05:20:14 See commit

This commit addresses a bug in the BufferFromHostLiteral function within the CommonPjRtClient class, specifically related to the handling of allocation failures. Previously, when an allocation failed, the corresponding BufferFromHostLiteral events were not being fulfilled properly, potentially leading to unhandled errors in the system. The updated code introduces a new approach to managing the allocation process, encapsulating the allocation logic within a lambda function that captures the status of the operation.

If an error occurs during the allocation or buffer definition, the promise associated with the event is set with the error status, ensuring that the event is fulfilled correctly even in the case of failure. This change enhances the robustness of the buffer allocation process, ensuring that all events are appropriately handled, thereby improving the overall reliability of the system. The commit includes 19 additions and 10 deletions, resulting in a net change of 29 lines in the code.

Files changed

third_party/xla/xla/pjrt/common_pjrt_client.cc

2025-06-27T07:03:46 See commit

This commit introduces the xla::PjRtPhaseCompiler, which is defined in the pjrt_compiler.* files, enhancing the functionality of the XLA (Accelerated Linear Algebra) framework. A corresponding C wrapper, PJRT_PhaseCompiler, is also introduced to facilitate communication through the C PJRT layer, as seen in the pjrt_c_wrapper_impl.h file. Additionally, the commit demonstrates the role of plugin developers by providing a phase-compile-extension callback that creates and returns a PJRT_PhaseCompiler object. This object is essential for executing subsequent operations such as RunPhases and GetPhasenames, with examples provided in the pjrt_c_api_phase_compile_sample_plugin.* files.

Furthermore, the commit includes modifications to various files, including headers and implementation files, to support this new functionality, along with the addition of test files to ensure the correctness of the new features. It sets the groundwork for future updates by indicating that utilities to facilitate interaction between callers and plugin developers will be introduced in the next change list. Overall, this commit enhances the XLA framework's capabilities and lays the foundation for further development in phase compilation.

Files changed

third_party/xla/xla/pjrt/BUILD
third_party/xla/xla/pjrt/c/pjrt_c_api_phase_compile_extension.h
third_party/xla/xla/pjrt/c/pjrt_c_api_phase_compile_internal.cc
third_party/xla/xla/pjrt/c/pjrt_c_api_wrapper_impl.h
third_party/xla/xla/pjrt/pjrt_compiler.cc
third_party/xla/xla/pjrt/pjrt_compiler.h
third_party/xla/xla/pjrt/pjrt_phase_compile_extension_test.cc
third_party/xla/xla/pjrt/pjrt_phase_compile_sample_plugin.cc
third_party/xla/xla/pjrt/pjrt_phase_compile_sample_plugin.h
third_party/xla/xla/pjrt/pjrt_phase_compiler_test.cc
third_party/xla/xla/pjrt/proto/BUILD
third_party/xla/xla/pjrt/proto/pjrt_partial_program.proto

2025-06-27T08:11:38 See commit

This commit introduces enhancements to the RedzoneBuffers functionality within the XLA (Accelerated Linear Algebra) GPU service by creating RedzoneBuffers from an Executable. Key modifications include the addition of a new method, FromComputation, which generates RedzoneBuffers based on the parameters of an HloComputation, leveraging a RedzoneAllocator for memory management. The changes also involve updates to the CreateInputs method, allowing it to accept a vector of HloInstructions instead of a single instruction, thereby improving flexibility in buffer creation.

In addition to the code modifications, the commit includes updates to the test suite, introducing a new test case that verifies the behavior of the FromComputation method. This test ensures that the expected input shapes and buffer sizes are correctly generated when provided with an HLO module. Overall, the changes enhance the functionality and robustness of the RedzoneBuffers system, facilitating better memory management in GPU computations within the XLA framework.

Files changed

third_party/xla/xla/service/gpu/autotuning/BUILD
third_party/xla/xla/service/gpu/autotuning/redzone_buffers.cc
third_party/xla/xla/service/gpu/autotuning/redzone_buffers.h
third_party/xla/xla/service/gpu/autotuning/redzone_buffers_test.cc

2025-06-27T18:59:36 See commit

This commit introduces enhancements to the DotLibraryRewriter in the XLA CPU backend by enabling the fusion of dot operations with element-wise operations such as addition, multiplication, and exponentiation using oneDNN. The changes include modifications to the build configuration and the addition of new test cases to validate the functionality. The OneDnnMatcher class has been updated to support these operations, ensuring that the fused operations can be correctly identified and processed based on their input and output types.

Additionally, the commit refines the testing framework for the dot operation rewrites, allowing for more comprehensive testing of various CPU architectures and data type combinations. It introduces a new structure to manage the specifications of the tests, including the library type, input and output data types, and CPU features. This structured approach aims to improve the efficiency and accuracy of the fusion process, ultimately enhancing performance for workloads that utilize these operations.

Files changed

third_party/xla/xla/backends/cpu/transforms/BUILD
third_party/xla/xla/backends/cpu/transforms/dot_library_rewriter_test.cc
third_party/xla/xla/backends/cpu/transforms/onednn_matcher.h

2025-06-27T23:34:36 See commit

This commit introduces support for mapping a single sub-axis to multiple sub-axes in the mesh deduplication process. The changes are primarily made in the dedup_meshes.cc file, which is part of the XLA (Accelerated Linear Algebra) service, and include modifications to related test files to ensure the new functionality is properly validated.

The updates in the test files, specifically dedup_meshes.mlir and sdy_round_trip_export_pipeline.mlir, reflect the adjustments needed to accommodate the new mapping capabilities. This enhancement is aimed at improving the flexibility and efficiency of the mesh deduplication process within the XLA framework.

Files changed

third_party/xla/xla/service/spmd/shardy/sdy_round_trip/dedup_meshes.cc
third_party/xla/xla/service/spmd/shardy/test/dedup_meshes.mlir
third_party/xla/xla/service/spmd/shardy/test/sdy_round_trip_export_pipeline.mlir

2025-06-28T01:53:40 See commit

This commit introduces a new API feature called PjRtFuture::Map to the PjRtFuture class within the XLA (Accelerated Linear Algebra) framework. The Map function allows users to transform the result of a PjRtFuture by applying a provided functor to its value, effectively creating a new PjRtFuture that holds the transformed result. The implementation supports both copyable and move-only types, ensuring flexibility in how the values are handled. The new functionality is designed to maintain the error propagation behavior, meaning if the original future completes with an error, the resulting future will also reflect that error.

In addition to the API implementation, the commit includes extensive unit tests to validate the behavior of the Map function for various scenarios, including both copyable and move-only futures. These tests confirm that the transformation behaves as expected, both when the original future is resolved successfully and when it encounters errors. Overall, this enhancement significantly increases the usability and expressiveness of the PjRtFuture class, allowing developers to work more efficiently with asynchronous operations in the XLA framework.

Files changed

third_party/xla/xla/pjrt/BUILD
third_party/xla/xla/pjrt/pjrt_future.h
third_party/xla/xla/pjrt/pjrt_future_test.cc

2025-06-28T09:03:04 See commit

The commit updates the forward compatibility horizon in the TensorFlow codebase, specifically in the compat.py file. The new date for the forward compatibility horizon is set to June 28, 2025, which reflects a minor adjustment from the previous date of June 27, 2025.

This change is part of an ongoing effort to manage compatibility within TensorFlow, allowing developers to utilize the forward_compatibility_horizon() function or the environment variable TF_FORWARD_COMPATIBILITY_DELTA_DAYS to modify the compatibility date as needed. The commit includes one addition and one deletion in the code, demonstrating a straightforward update to maintain accurate compatibility timelines.

Files changed

tensorflow/python/compat/compat.py

2025-06-30T19:33:05 See commit

This commit is a rollback of a previous change (identified by the hash 015e6d1c6ebec6b228d213d5831c8f09735840a3) due to suspected correctness issues. The modifications primarily affect the HLO (High-Level Optimization) test output and the reshape mover functionality within the XLA (Accelerated Linear Algebra) framework. The changes include alterations in the HLO test checks, where certain reshaping operations and their corresponding checks have been adjusted to ensure the correctness of the transformations being applied.

In addition to the test output adjustments, the code for the reshape mover has been modified to improve its functionality. This includes changes in how rearranged operands are handled and a shift in the return type for certain functions to better reflect the operations being performed. The overall aim of this commit is to address potential errors introduced in the prior changes and to refine the logic for handling HLO transformations, ensuring a more reliable and accurate implementation.

Files changed

third_party/xla/xla/hlo/tools/tests/generate_hlo_test_checks_test_output.hlo
third_party/xla/xla/hlo/transforms/simplifiers/BUILD
third_party/xla/xla/hlo/transforms/simplifiers/reshape_mover.cc
third_party/xla/xla/hlo/transforms/simplifiers/reshape_mover.h
third_party/xla/xla/hlo/transforms/simplifiers/reshape_mover_test.cc

2025-06-30T19:41:01 See commit

This commit introduces the TryMap method to the PjRtFuture class within the XLA project, enabling the mapping of futures with a functor that may itself fail. The TryMap function allows users to apply a transformation to the value contained in a PjRtFuture, while also handling potential errors gracefully. If the original future completes with an error, the resulting future will also reflect that error. The implementation includes overloads for both copyable and move-only types, ensuring versatility in usage.

Additionally, the commit includes extensive unit tests to validate the functionality of TryMap, covering scenarios such as successful transformations, error forwarding, and error creation. These tests ensure that the new method behaves as expected under various conditions, reinforcing the robustness of the PjRtFuture class when dealing with asynchronous operations and error management. The commit sets the stage for further enhancements related to error forwarding in a subsequent change.

Files changed

third_party/xla/xla/pjrt/pjrt_future.h
third_party/xla/xla/pjrt/pjrt_future_test.cc

2025-06-30T20:44:41 See commit

This commit addresses a critical issue related to elementwise operations offloaded to XNNPACK by implementing checks on the operands of these operations. Specifically, it resolves crashes that occurred when attempting to create external tensors of unsupported types. The modifications include the addition of a check to ensure that all operands of the operation have a valid data type supported by XNNPACK.

The changes were made in two files: xnn_graph_fusion_test.cc and xnn_fusion.cc. A new test case, BasicFusionUnsupportedOperandType, was introduced to verify the behavior of the XNNGraphFusion when presented with an unsupported operand type. Additionally, the function IsElementwiseOpSupportedByXnn was updated to include a validation step that confirms each operand's data type is acceptable before proceeding with the operation. These enhancements aim to improve the robustness and stability of the XNNPACK integration within the XLA backend.

Files changed

third_party/xla/xla/backends/cpu/transforms/xnn_graph_fusion_test.cc
third_party/xla/xla/backends/cpu/xnn_fusion.cc