tensorflow changelog

9 months ago

Here's a rundown of the latest changes, packed with exciting new features and crucial bug fixes to keep everything running smoothly. 🚀

New Feature: Python bindings for the HLO Diff tool are now live! This update makes it easier to compare HLO modules with new options for computing instruction fingerprints, enhancing the tool's flexibility and usability.
New Feature: Say hello to the TfLiteQuantizationClone function! This nifty addition lets you clone TfLiteQuantization structs in TensorFlow Lite, making it a breeze to duplicate quantization parameters without altering the originals. Handy, right?
New Feature: We've rolled out _XlaShardingV2 for tf.XlaShardOp, boosting TensorFlow's sharding capabilities during the XLA lowering process. This means better performance for distributed computing and TPU workloads!
New Feature: The SerDesDefaultVersionAccessor::Get() method is here to make your life easier by managing default SerDes versions in IFRT. It ensures robust version handling, especially useful for IFRT Proxy development.
New Feature: Two new methods, ToLiteral and LazyToLiteral, have been added to CommonPjRtBufferImpl. These methods provide more flexibility in handling data conversions, making asynchronous operations smoother in the XLA framework.
Improvement: We've integrated Triton up to version 0a4aa69, updating the LLVM integration patches and improving CUDA and ROCm compatibility. This makes the build process more streamlined and efficient.
Improvement: The RedzoneBuffers functionality in XLA:GPU has been enhanced. Now, RedzoneBuffers can be created from an Executable, improving memory management and flexibility in buffer creation.
Improvement: Upwards tile propagation for BroadcastOp in XLA:GPU is now implemented. This enhancement optimizes tensor operations by accurately propagating tile information to inputs of broadcast operations.
Bugfix: A pesky bug in the alternate memory allocation for XLA:MSA has been squashed! Chunks are now properly reserved and tracked, preventing issues with overlapping memory.
Bugfix: The integration of hermetic C++ toolchains in TensorFlow has been rolled back. This decision was made to avoid increased wheel sizes and maintain compliance with manyLinux standards.
Bugfix: Due to timeouts on Linux, the worker_tags_test has been temporarily disabled for Python 3.13. We're on it and will get this sorted out soon!
Chore: The highwayhash library has been moved to a new location within the TensorFlow repository, tidying up the project's structure and improving maintainability.

We hope these updates make your development experience even better! Keep an eye out for more improvements and features coming your way. 🌟

Included Commits

2025-06-23T17:31:37 See commit

This commit introduces Python bindings for the HLO (High-Level Optimizer) Diff tool, enhancing its functionality within the XLA (Accelerated Linear Algebra) framework. Key changes include the addition of a new protocol buffer file, diff_options.proto, which defines the OptionsProto message structure. This structure includes nested messages for specifying paths to HLO snapshots and options for computing instruction fingerprints, allowing for more flexible and detailed diff operations between HLO modules.

Additionally, the existing diff_result.proto file has been modified to include a new message, DiffResultOrErrorProto, which encapsulates the results or errors encountered during the diff computation process. This commit effectively expands the capabilities of HLO Diff by providing a more comprehensive set of options and a structured way to handle the outcomes of diff operations, thereby improving the overall usability and functionality of the tool.

Files changed

third_party/xla/xla/hlo/tools/hlo_diff/proto/BUILD
third_party/xla/xla/hlo/tools/hlo_diff/proto/diff_options.proto
third_party/xla/xla/hlo/tools/hlo_diff/proto/diff_result.proto

2025-06-24T00:00:58 See commit

The commit involves relocating the highwayhash library from the tensorflow/third_party directory to the xla/third_party/highwayhash directory within the TensorFlow repository. This change is reflected in modifications to two files: tensorflow/workspace2.bzl, where the import path for highwayhash is updated to its new location, and third_party/xla/third_party/highwayhash/BUILD, which adjusts the package's license configuration.

Overall, this commit streamlines the organization of third-party dependencies by consolidating them under the XLA (Accelerated Linear Algebra) framework, potentially improving maintainability and clarity within the project's structure. The changes involve minor edits, including the addition and removal of specific lines in the respective files to accommodate this restructuring.

Files changed

tensorflow/workspace2.bzl
third_party/xla/third_party/highwayhash/BUILD

2025-06-24T04:32:33 See commit

This commit introduces the TfLiteQuantizationClone function, which allows for the cloning of the TfLiteQuantization struct in TensorFlow Lite. The function creates a new instance of TfLiteQuantization, copying over the type and parameters from the source struct to the destination struct. This addition enhances the functionality of TensorFlow Lite by enabling users to easily duplicate quantization parameters, which is particularly useful in scenarios where modifications to quantization settings are needed without altering the original parameters.

In conjunction with this, the commit also includes modifications to the common.cc and common.h files, where the new cloning function is implemented and declared, respectively. The changes streamline memory management and ensure that the cloning process correctly handles the allocation of necessary resources, thereby improving the overall efficiency and usability of the library. The commit reflects a focus on enhancing the capabilities of TensorFlow Lite's quantization features, making it easier for developers to work with quantized models.

Files changed

tensorflow/lite/core/c/common.cc
tensorflow/lite/core/c/common.h

2025-06-25T18:46:30 See commit

This commit disables the worker_tags_test for Python 3.13 due to recurring timeout issues on Linux systems. The modification is made in the TensorFlow repository, specifically within the build configuration file, where a new tag "no_oss_py313" is added to indicate the exclusion of this test for the specified Python version.

The commit aims to address the problem of timeouts that occur during the execution of the test, as noted in the associated TODO comment referencing a specific bug report. By marking the test with the new tag, it signals that further investigation and fixes are needed to resolve the timeout issues in the future.

Files changed

tensorflow/python/data/experimental/kernel_tests/service/BUILD

2025-06-25T19:57:09 See commit

This commit introduces a new feature to facilitate the propagation of default SerDes (Serialization/Deserialization) versions within the IFRT (Intermediate Representation Framework) by implementing the SerDesDefaultVersionAccessor::Get() method. This method retrieves the default SerDesVersion, typically set to SerDesVersion::current(), whenever a default value is used for SerializeOptions::version or the ToProto() version argument. Additionally, the constructors of SerializeOptions and its subclasses have been modified to allow for the direct specification of a non-default SerDesVersion, streamlining the process of version assignment.

Furthermore, the commit includes a testing mechanism that triggers a crash when an attempt is made to fetch a default version while the IFRT_TESTING_BAD_DEFAULT_SERDES_VERSION macro is defined. This feature is particularly beneficial for the development of the IFRT Proxy, ensuring that any serialization and deserialization operations use a negotiated non-default version between the IFRT Proxy server and client, thus enhancing the robustness of version management in the framework. The changes affect multiple files within the IFRT codebase, reflecting a comprehensive update to the serialization options and associated tests.

Files changed

third_party/xla/xla/python/ifrt/BUILD
third_party/xla/xla/python/ifrt/array_spec.h
third_party/xla/xla/python/ifrt/attribute_map.h
third_party/xla/xla/python/ifrt/custom_call_program_serdes_test.cc
third_party/xla/xla/python/ifrt/device_list.h
third_party/xla/xla/python/ifrt/dtype.h
third_party/xla/xla/python/ifrt/executable.h
third_party/xla/xla/python/ifrt/hlo/hlo_program_serdes_test.cc
third_party/xla/xla/python/ifrt/ir/BUILD
third_party/xla/xla/python/ifrt/ir/ifrt_ir_program.h
third_party/xla/xla/python/ifrt/ir/sharding_param.h
third_party/xla/xla/python/ifrt/layout.cc
third_party/xla/xla/python/ifrt/layout.h
third_party/xla/xla/python/ifrt/layout_serdes_test.cc
third_party/xla/xla/python/ifrt/plugin_program_serdes_test.cc
third_party/xla/xla/python/ifrt/remap_plan.h
third_party/xla/xla/python/ifrt/serdes.cc
third_party/xla/xla/python/ifrt/serdes.h
third_party/xla/xla/python/ifrt/serdes_default_version_accessor.h
third_party/xla/xla/python/ifrt/serdes_version.md
third_party/xla/xla/python/ifrt/shape.h
third_party/xla/xla/python/ifrt/sharding.cc
third_party/xla/xla/python/ifrt/sharding.h
third_party/xla/xla/python/ifrt/sharding_serdes_test.cc
third_party/xla/xla/python/pjrt_ifrt/pjrt_layout_serdes_test.cc
third_party/xla/xla/python/pjrt_ifrt/xla_sharding_serdes_test.cc

2025-06-25T21:07:11 See commit

This commit introduces two new methods, ToLiteral and LazyToLiteral, to the CommonPjRtBufferImpl class within the XLA (Accelerated Linear Algebra) library. The ToLiteral method allows users to convert a buffer directly into a literal, while LazyToLiteral accepts a generator function that will produce the literal when called. Both methods return a PjRtFuture, enabling asynchronous operations and handling of potential errors during the conversion process. The implementation ensures that these operations take into account the current state of the buffer and its dependencies, preventing issues such as deadlocks when invoked from within host callbacks.

In addition to the new methods, the commit also updates various header files and implementation files to include necessary headers and declarations, such as host_callback and pjrt_future. Changes to the CommonPjRtClient class include a new method for creating linked promises that provide debugging information during asynchronous operations. Overall, this commit enhances the buffer's functionality by allowing for more flexible and efficient handling of data conversions in the XLA framework.

Files changed

third_party/xla/xla/pjrt/BUILD
third_party/xla/xla/pjrt/common_pjrt_client.cc
third_party/xla/xla/pjrt/common_pjrt_client.h
third_party/xla/xla/pjrt/cpu/raw_buffer.cc
third_party/xla/xla/pjrt/cpu/raw_buffer.h
third_party/xla/xla/pjrt/device_event.h
third_party/xla/xla/pjrt/raw_buffer.h

2025-06-26T03:02:13 See commit

This commit addresses a bug in the alternate memory allocation process within the XLA (Accelerated Linear Algebra) framework, specifically related to the management of memory chunks during forced prefetching and minimum time allocation. The issue arose because certain chunks were not being reserved and added to the list of pending chunks, which led to problems with overlapping memory chunks when utilizing buffer coloring in alternate memory. The fix involves adding calls to AddToPendingChunks to ensure that these chunks are correctly tracked, thus preventing the overlap issues.

Additionally, the commit includes modifications to the memory space assignment testing framework, introducing a new test case that verifies the correct handling of multiple operands with alternate memory space coloring. This test ensures that the memory allocation is functioning as expected after the bug fix, confirming that the operands of an addition operation are appropriately assigned to the alternate memory space. Overall, these changes enhance the stability and reliability of memory allocation in the XLA framework.

Files changed

third_party/xla/xla/service/memory_space_assignment/algorithm.cc
third_party/xla/xla/service/memory_space_assignment/memory_space_assignment.cc
third_party/xla/xla/service/memory_space_assignment/memory_space_assignment_test.cc

2025-06-26T14:22:42 See commit

This commit introduces an implementation for upwards tile propagation specifically for the Broadcast operation within the XLA (Accelerated Linear Algebra) GPU framework. The new function, PropagateTileToInputForBroadcastOp, calculates the necessary offsets, sizes, strides, and bounds for the input of a broadcast operation based on the properties of the result tile. The function utilizes the dimensions specified in the broadcast operation to create a new ExperimentalSymbolicTile, which is then returned as part of the TiledOperands structure. This enhancement aims to improve the efficiency of tensor operations by ensuring that the tile information is accurately propagated to inputs of broadcast operations.

In addition to the implementation, the commit also includes a corresponding test case to verify the functionality of the new tile propagation for broadcast operations. The test checks that the tile propagation correctly computes the offsets, sizes, strides, and upper bounds based on a provided HLO (High-Level Operation) module. The successful results of this test ensure that the new functionality integrates well with existing symbolic tile propagation mechanisms, contributing to the overall optimization of GPU operations in XLA.

Files changed

third_party/xla/xla/service/gpu/model/experimental/symbolic_tile_propagation.cc
third_party/xla/xla/service/gpu/model/experimental/symbolic_tile_propagation_test.cc

2025-06-26T15:27:04 See commit

The recent commit integrates Triton up to version 0a4aa69, marking a significant update for the project. Key changes include modifications to the LLVM integration patches, where four outdated patches were removed and a new target specification for CUDA was added. Additionally, a temporary pipeline patch for handling F16 types was completely removed, suggesting a shift in how these types are processed within the Triton framework. Several files were updated to reflect the new integration, including updates in the compilation pipeline for both CUDA and ROCm, which involved renaming functions and adjusting parameters to align with the latest Triton functionalities.

Furthermore, the commit enhances the autotuning capabilities by updating the cache version, ensuring that any changes in Triton will prompt a cache invalidation when necessary. This integration not only streamlines the build process by removing obsolete patches but also optimizes the performance and compatibility of Triton with various GPU architectures, reflecting a commitment to maintaining up-to-date and efficient code. Overall, this update positions the project to leverage the latest advancements in GPU programming through Triton.

Files changed

third_party/triton/llvm_integration/series.bzl
third_party/triton/temporary/pipeline_f16.patch
third_party/triton/workspace.bzl
third_party/xla/xla/backends/gpu/codegen/triton/compilation_pipeline_cuda.cc
third_party/xla/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
third_party/xla/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_port_test.cc
third_party/xla/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test.cc
third_party/xla/xla/service/gpu/autotuning/autotuner_util.h

2025-06-26T19:28:41 See commit

This commit introduces the _XlaShardingV2 operation to the tf.XlaShardOp and integrates it into the TensorFlow 2 XLA (Accelerated Linear Algebra) lowering process. The changes span multiple files within the TensorFlow compiler and MLIR (Multi-Level Intermediate Representation) directory, indicating a significant update to how sharding is handled in TensorFlow's XLA framework. The modifications include updates to various utility and transformation files, ensuring that the new sharding operation is properly recognized and utilized during the compilation and execution of TensorFlow models.

In addition to the new operation, the commit also includes updates to testing files, ensuring that the new functionality is thoroughly tested. This reflects an ongoing effort to enhance TensorFlow's capabilities in distributed computing and optimization, particularly for TPU (Tensor Processing Unit) workloads. The changes aim to improve the efficiency and performance of sharding in TensorFlow, which is crucial for scaling machine learning models across multiple devices.

Files changed

tensorflow/compiler/mlir/lite/transforms/prepare_patterns.td
tensorflow/compiler/mlir/tensorflow/ir/tf_ops.td
tensorflow/compiler/mlir/tensorflow/tests/BUILD
tensorflow/compiler/mlir/tensorflow/tests/tpu_sharding_identification.mlir
tensorflow/compiler/mlir/tensorflow/tests/xla_sharding_util_test.cc
tensorflow/compiler/mlir/tensorflow/transforms/prepare_tpu_computation_for_tf_export.cc
tensorflow/compiler/mlir/tensorflow/utils/xla_sharding_util.cc
tensorflow/compiler/mlir/tensorflow/utils/xla_sharding_util.h
tensorflow/compiler/mlir/tf2xla/internal/passes/tpu_sharding_identification_pass.cc
tensorflow/compiler/mlir/tf2xla/tests/legalize-tf.mlir
tensorflow/compiler/mlir/tf2xla/transforms/legalize_tf.cc
tensorflow/compiler/tf2xla/BUILD
tensorflow/compiler/tf2xla/sharding_util.cc
tensorflow/compiler/tf2xla/sharding_util_test.cc
tensorflow/dtensor/mlir/dtensor_layout_to_xla_sharding_op.cc
tensorflow/python/compiler/xla/experimental/xla_sharding.py

2025-06-26T21:04:59 See commit

This commit reverts the integration of hermetic C++ toolchains in TensorFlow due to concerns that they significantly increase the wheel size and complicate compliance with the manyLinux tag. The specific changes involve modifications to the .bazelrc configuration file, where various flags related to the use of hermetic toolchains have been disabled. The rollback aims to streamline the build process and maintain compatibility with existing standards.

In addition to the rollback, the commit includes minor adjustments to some test files, specifically in the RuntimeShapeTest, where a more efficient method for calculating the flat size of shapes is implemented. The changes reflect a broader effort to optimize TensorFlow's build configurations while addressing potential compliance issues.

Files changed

.bazelrc
ci/official/envs/linux_x86
tensorflow/compiler/mlir/lite/kernels/internal/runtime_shape_test.cc
tensorflow/lite/kernels/internal/runtime_shape_test.cc

2025-06-27T08:11:38 See commit

This commit introduces enhancements to the RedzoneBuffers functionality within the XLA (Accelerated Linear Algebra) GPU service by creating RedzoneBuffers from an Executable. Key modifications include the addition of a new method, FromComputation, which generates RedzoneBuffers based on the parameters of an HloComputation, leveraging a RedzoneAllocator for memory management. The changes also involve updates to the CreateInputs method, allowing it to accept a vector of HloInstructions instead of a single instruction, thereby improving flexibility in buffer creation.

In addition to the code modifications, the commit includes updates to the test suite, introducing a new test case that verifies the behavior of the FromComputation method. This test ensures that the expected input shapes and buffer sizes are correctly generated when provided with an HLO module. Overall, the changes enhance the functionality and robustness of the RedzoneBuffers system, facilitating better memory management in GPU computations within the XLA framework.

Files changed

third_party/xla/xla/service/gpu/autotuning/BUILD
third_party/xla/xla/service/gpu/autotuning/redzone_buffers.cc
third_party/xla/xla/service/gpu/autotuning/redzone_buffers.h
third_party/xla/xla/service/gpu/autotuning/redzone_buffers_test.cc