tensorflow changelog

10 months ago

Welcome to the latest and greatest updates! We've been busy bees 🐝 and have some exciting new features and improvements to share with you. Let's dive into the juicy details:

New Features

Raw Buffers FTW! 🎉: Say hello to use_raw_buffers, a feature that keeps raw buffer references alive and kicking until data transfer is complete. No more premature deletions! Just a heads up, it might sneakily read from donated arrays, but we're working on it!
XLA Scheduling Gets a Boost: We've added a ScheduleConfig to XLA's HloModuleConfig. Now you can manage instruction execution like a pro, making your computations smoother than ever.
Unified Model ID Metric: Track your loaded models with a new gauge metric that records the unified model ID. It's like a fingerprint for your models, ensuring observability is top-notch.
Weight-Only PTQ for TensorFlow: Introducing tf_weight_only_ptq for StableHLO. This library lets you perform int8 weight-only quantization on dot_general operations, streamlining model optimization without needing calibration.
Calibration Component in TensorFlow: Meet tf_component, a new addition to the TensorFlow MLIR quantization framework. It manages post-calibration transformations, improving the accuracy of your quantized models.

Improvements

Low Latency Thread Pool in PjRT: We've optimized the PjRT GPU client with a low latency thread pool for async operations. Your GPU computations are about to get a whole lot zippier! 🚀
Allocator Magic During Compilation: The GPU client in XLA now uses a configured memory allocator during compilation. This means better memory management and performance across devices.
Fusion Flexibility: The MultiOutputFusion class now allows more flexibility for derived classes, making the fusion process efficient and tailored to your backend needs.

Bugfixes

Race Condition No More: We've squashed a race condition bug in flat_map_utils.cc. Now, threads won't step on each other's toes, ensuring smooth and stable dataset handling.
Crash-Free Layout Printing: Printing an invalid Layout in XLA won't crash your app anymore. Instead, it gracefully handles errors with a friendly "?" placeholder.
Memory Space Propagation Fix: Fixed an issue with the NVIDIA GPU's CollectiveColorer, ensuring memory spaces are correctly assigned and tests pass with flying colors.

Chore

Tidying Up: Removed an unnecessary header file in calibration_wrapper.cc, making the codebase leaner and meaner.

That's all for now, folks! Stay tuned for more updates and keep those feedbacks coming. Happy coding! 😄

Included Commits

2025-05-09T19:35:13 See commit

This commit introduces a low latency thread pool for asynchronous operations within the PjRT (Portable JAX Runtime) GPU client. The changes primarily involve the modification of the thread pool initialization in the TfrtGpuClient class, where the thread pools for compiling, blocking, and non-blocking operations are now created with low latency options. This is achieved by passing tsl::ThreadOptions() and a true flag to the thread pool constructors, which optimizes the thread pools for lower latency performance.

In addition to the adjustments in thread pool configuration, the commit includes a total of 18 changes across the modified file, with 12 lines added and 6 lines deleted. These enhancements are expected to improve the efficiency of asynchronous tasks in GPU computations, potentially leading to better performance in applications using the PjRT framework. The overall goal of these modifications is to enhance responsiveness and throughput in GPU-based processing tasks.

Files changed

third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.cc

2025-05-09T21:13:41 See commit

This commit addresses a race condition in the tensorflow/python/data/flat_map_utils.cc file, specifically within the FlatMapRandomAccessHandler::MakeInputDatasets() function. The race condition arose from concurrent access to the input_datasets container through the use of .size() and operator[], which could lead to unpredictable behavior in a multi-threaded environment. The fix involves modifying the approach so that threads no longer access the container directly; instead, they write to a reference of the last element in the input_datasets, ensuring thread-safe operations.

The changes made include the introduction of a reference to the last element of input_datasets, which is used by the threads to store the results of the MakeInputDataset() function. By eliminating direct access to the container, the code enhances its robustness against race conditions, improving overall stability and reliability when handling datasets in parallel. The commit includes minor modifications, with four lines added and three lines removed, resulting in a cleaner and safer implementation.

Files changed

tensorflow/core/data/flat_map_utils.cc

2025-05-10T00:46:49 See commit

This commit addresses a critical issue in the XLA (Accelerated Linear Algebra) library by preventing a crash that occurs when attempting to print an invalid Layout. The original implementation would terminate the program when encountering an invalid DimLevelType, which is not desirable behavior. Instead, the code has been modified to return a placeholder character ("?") when the DimLevelType is invalid, allowing the program to continue running without crashing.

The changes were made in the layout.cc file, where the logic for handling invalid DimLevelType values was adjusted. Specifically, the error handling was updated from a fatal log message that would crash the application to a safer return of a question mark, indicating the invalid type without disrupting the flow of execution. This improvement enhances the robustness of the XLA library when dealing with layout representations.

Files changed

third_party/xla/xla/layout.cc

2025-05-10T02:51:48 See commit

This commit introduces enhancements to the MultiOutputFusion class in the XLA (Accelerated Linear Algebra) service, allowing derived classes to have greater flexibility in forming the initial fusion worklist for computations. The changes include a substantial refactor of the logic that determines which instructions can be fused together, encapsulated in a new method called CreateFusionWorkListForCurrentComputation(). This method streamlines the candidate selection process, ensuring that only profitable and legal fusion candidates are considered, thus improving the efficiency of the fusion process.

Additionally, the commit modifies the internal structure of the FusionCandidate class and introduces new methods to facilitate the addition of candidates to the worklist, enhancing the modularity and readability of the code. The changes also include the addition of logging for better traceability during the fusion candidate selection process. Overall, these modifications aim to optimize the performance of multi-output fusion by providing a more flexible and efficient mechanism for managing fusion candidates in various backend implementations.

Files changed

third_party/xla/xla/service/BUILD
third_party/xla/xla/service/multi_output_fusion.cc
third_party/xla/xla/service/multi_output_fusion.h

2025-05-10T17:47:01 See commit

This commit introduces a new metric to track the unified model ID of loaded models within TensorFlow's saved model infrastructure. Specifically, it adds a gauge metric that records the unified model ID, which is derived from the model's fingerprint or generated when the fingerprint is absent. The implementation involves modifications across several files, including the addition of the gauge metric in the saved_model.cc file and the creation of a function to emit this unified model ID during the loading process of a saved model.

The changes also include updates to the test suite to validate the correct emission of the unified model ID under various scenarios, such as when the fingerprint file is missing or contains an empty UUID. The tests ensure that the functionality behaves as expected, confirming that the unified model ID is accurately recorded and retrievable, thereby enhancing the observability of model loading processes in TensorFlow.

Files changed

tensorflow/cc/saved_model/BUILD
tensorflow/core/tfrt/saved_model/BUILD
tensorflow/core/tfrt/saved_model/saved_model.cc
tensorflow/core/tfrt/saved_model/saved_model_testutil.cc
tensorflow/core/tfrt/saved_model/saved_model_testutil.h
tensorflow/core/tfrt/saved_model/tests/BUILD
tensorflow/core/tfrt/saved_model/tests/gen_saved_model.bzl
tensorflow/core/tfrt/saved_model/tests/saved_model_test.cc

2025-05-13T19:29:35 See commit

The commit addresses an issue in the XLA (Accelerated Linear Algebra) framework related to the NVIDIA GPU's handling of memory space assignments during collective operations. Specifically, it fixes the failure of the test pytest tests/memories_test.py when the flag XLA_FLAGS="--xla_gpu_enable_nccl_user_buffers=1" is set. The problem stemmed from the CollectiveColorer not properly propagating non-default memory space assignments. The changes implemented in this pull request ensure that the memory space from the layout is correctly assigned, thereby resolving the test failures.

The modifications include updates to the CollectiveColorer logic to recognize and utilize non-default memory spaces when available. Additionally, a new test, CollectiveMemorySpaceSmoke, was added to verify that the correct memory space is applied during execution, specifically checking that the output is directed to the collective memory space. These changes are safeguarded by existing tests to ensure compatibility with NCCL user buffers, thus maintaining the integrity of the system while improving its functionality.

Files changed

third_party/xla/xla/pjrt/gpu/se_gpu_pjrt_client_test.cc
third_party/xla/xla/service/gpu/gpu_memory_space_assignment.h

2025-05-14T07:38:31 See commit

The commit involves a modification to the file calibration_wrapper.cc in the TensorFlow Lite Python optimization module. Specifically, it removes the inclusion of the header file absl/types/optional.h, which suggests that the code no longer relies on the features or types provided by this particular library.

This change results in a reduction of one line of code, indicating a potential simplification of the codebase or a shift in the dependencies required for the functionality within calibration_wrapper.cc. The overall impact of this change may enhance maintainability or performance by streamlining the code.

Files changed

tensorflow/lite/python/optimize/calibration_wrapper.cc

2025-05-14T20:50:02 See commit

This commit introduces a new feature called use_raw_buffers, which allows the implementation to maintain references to raw buffers instead of using PjRtBuffers. This change addresses a problem where buffers could be deleted before data transfer was complete, thus preventing potential data loss. However, it also raises a new concern: if the buffers are donated, the implementation may inadvertently read from these donated arrays without appropriate checks. The commit notes that once the underlying runtime properly manages usage holds, this new approach will utilize such holds, and the previous PjRtBuffer implementation will be phased out.

The changes primarily involve modifications to several files related to the buffer transfer process, including the addition of new classes and methods to handle raw buffers. These changes include the implementation of RawBufferEntry and PjRtBufferEntry classes, which manage the transfer of data from raw buffers and PjRtBuffers, respectively. The commit also updates the build configuration to include dependencies for the new raw buffer functionality and increments the version number to reflect these changes.

Files changed

third_party/xla/xla/python/transfer/BUILD
third_party/xla/xla/python/transfer/streaming_ifrt.cc
third_party/xla/xla/python/transfer/streaming_ifrt.h
third_party/xla/xla/python/transfer/streaming_ifrt_test.cc
third_party/xla/xla/python/version.h

2025-05-14T21:42:38 See commit

The commit introduces a new component, tf_component, to the TensorFlow MLIR quantization framework, specifically within the stablehlo calibration directory. This addition includes a new source file tf_component.cc and a header file tf_component.h, which collectively implement functionalities for post-calibration graph transformations as part of post-training static-range quantization. The CalibrationComponent class is designed to manage the calibration process, handling tasks such as exporting pre-calibrated models to a SavedModel format, running calibration passes on the models, and importing calibrated models back into the system. The code is structured to utilize various dependencies related to TensorFlow's quantization and calibration processes.

Additionally, the changes made to the BUILD file reflect the integration of the new tf_component library into the build system, ensuring it is compatible with other components and dependencies in the quantization framework. This commit is a significant step towards enhancing the quantization capabilities in TensorFlow by providing a structured approach to calibration, which is crucial for improving the accuracy of quantized models.

Files changed

tensorflow/compiler/mlir/quantization/stablehlo/cc/calibration/BUILD
tensorflow/compiler/mlir/quantization/stablehlo/cc/calibration/tf_component.cc
tensorflow/compiler/mlir/quantization/stablehlo/cc/calibration/tf_component.h

2025-05-14T23:07:16 See commit

This commit introduces a new library called tf_weight_only_ptq within the TensorFlow MLIR quantization framework, specifically targeting weight-only post-training quantization (PTQ) for StableHLO. The library includes two new files: tf_weight_only_ptq.cc and tf_weight_only_ptq.h, which implement the functionality for performing int8 weight-only quantization on dot_general operations. The changes involve adding a WeightOnlyPtqComponent class that manages the quantization process and a function QuantizeWeightOnlyPtq that handles the quantization of a SavedModel, allowing for the quantized model to be saved and exported.

Additionally, the commit modifies the BUILD file to include the new library and its dependencies, ensuring compatibility with various components of the TensorFlow quantization pipeline. This enhancement is significant as it improves the quantization capabilities within TensorFlow, particularly for scenarios where weight-only quantization is sufficient without requiring calibration, thereby streamlining the model optimization process.

Files changed

tensorflow/compiler/mlir/quantization/stablehlo/cc/BUILD
tensorflow/compiler/mlir/quantization/stablehlo/cc/tf_weight_only_ptq.cc
tensorflow/compiler/mlir/quantization/stablehlo/cc/tf_weight_only_ptq.h

2025-05-14T23:33:09 See commit

The commit introduces a new configuration option for scheduling within the HloModuleConfig of XLA (Accelerated Linear Algebra). Specifically, it adds a ScheduleConfig to manage the sequence of instructions to be executed. The changes include modifications to various files, such as adding the ScheduleConfig to the HloModuleConfig class, updating the serialization methods for HloModuleConfigProto, and integrating the new configuration into the existing build system.

In detail, the commit updates the HloModuleConfig class to include methods for accessing and modifying the ScheduleConfig, and it ensures that this configuration is properly serialized and deserialized in the proto definitions. This enhancement is aimed at improving the scheduling capabilities of XLA, allowing for more efficient execution of computations by managing the order of instruction execution.

Files changed

third_party/xla/xla/service/BUILD
third_party/xla/xla/service/hlo_module_config.cc
third_party/xla/xla/service/hlo_module_config.h
third_party/xla/xla/xla.proto

2025-05-15T01:29:18 See commit

This commit modifies the GPU client implementation in the XLA (Accelerated Linear Algebra) library to utilize a configured memory allocator during the compilation process. Specifically, it updates the UpdateCompileOptionsInternal function to set the device allocator based on the allocator configuration rather than using a default memory allocator from the XLA backend. The changes streamline the allocation process by ensuring that the correct allocator is employed, which is critical for optimizing memory usage and performance during GPU computations.

Additionally, the commit enhances the CreateDeviceAllocator function to accept a list of devices, allowing for more granular control over the allocation process. It ensures that each device's stream is used during allocation, thereby improving the efficiency of memory management across multiple devices. This update not only reduces the number of unnecessary allocations but also integrates better with the overall architecture of the XLA framework, ultimately leading to more effective resource utilization during GPU-based computations.

Files changed

third_party/xla/xla/pjrt/gpu/tfrt/tfrt_gpu_client.cc