tensorflow changelog

7 months ago

Welcome to the latest and greatest updates! We've been busy making some awesome improvements and squashing pesky bugs. Here's a rundown of the cool new features, improvements, and fixes we've rolled out:

New Features 🌟

PJRT Buffer Magic: Say hello to PJRT_Buffer_CopyRawToHost in the PJRT C API! This nifty feature lets you copy raw data from device to host memory, making your GPU app data handling smoother than ever. It’s a game-changer for high-performance computing and machine learning aficionados.
HLO Interfaces: We've introduced HloModuleInterface and HloInstructionInterface to spice up your HLO module and instruction management. These interfaces bring organization and efficiency to your TensorFlow profiling utilities with enhanced data handling.
Dot Product Testing: The XLA GPU framework now includes a test for dot products with batch and contracting dimensions. This ensures robust backend support for your matrix operations, making sure everything runs like a well-oiled machine.

Improvements 🚀

LLVM Update: We've synced up with the latest LLVM updates, ensuring our project stays sharp and up-to-date with the latest features and improvements.
GEMM Fusion Flexibility: Our GPU GEMM fusion now supports broadcasts of trivially-sized dimensions, like [1,n] to [1,m,n], thanks to PR #19112. This means more flexibility and efficiency in your matrix operations.
TFL Pass Migration: The PushTransposeThroughEwisePass has migrated to the new TFL pass mechanism, streamlining the code and making it easier to maintain. Plus, we've updated the command-line argument for consistency.

Bugfixes 🐛

No Signature, No Problem: Fixed an issue in TensorFlow Lite where models without signatures were causing hiccups. Now, we pass a nullptr for models lacking function signatures, keeping everything running smoothly.
Algebraic Simplifier Tweaks: We've ensured the AlgebraicSimplifier in XLA respects host offloading copies, preventing any unwanted eliminations and maintaining computation integrity.
Developer Guide Tweak: Fixed a formatting blip in developer_guide.md where <USER> was misbehaving. It's now {USER}, and the guide looks fab!

Chore 🧹

Code Cleanup: Tidied up gpu_types.h by removing unused type aliases. This decluttering enhances clarity and makes room for future awesomeness.

That's all for now, folks! Keep your eyes peeled for more exciting updates and improvements coming your way. 🎉

Included Commits

2024-11-14T20:46:02 See commit

The recent commit focused on cleaning up the gpu_types.h file by removing several unused type aliases that were no longer necessary. Specifically, it deleted a total of 30 lines, which included type aliases related to GPU handles and an empty struct designed to represent unsupported features across various GPU programming interfaces, such as CUDA, ROCm, and SYCL.

This cleanup effort enhances the clarity and maintainability of the codebase by eliminating redundant definitions, thereby streamlining the code for future development. The removed aliases included various handles for GPU streams, functions, devices, and graphs, which were replaced or rendered obsolete due to updates in the underlying frameworks or changes in the project's requirements.

Files changed

third_party/xla/xla/stream_executor/gpu/gpu_types.h

2024-11-14T21:07:05 See commit

This commit addresses a formatting issue in the developer_guide.md file of the XLA documentation. The placeholder <USER> was incorrectly interpreted as an HTML tag, which prevented it from rendering properly. To resolve this, the commit replaces <USER> with {USER}, ensuring that the documentation displays correctly.

In addition to this primary fix, the commit also includes minor formatting changes, such as converting HTML-style code blocks into Markdown-style code blocks for better readability and consistency. Overall, the modifications enhance the clarity and usability of the developer guide for users looking to clone the XLA repository and set up their development environment.

Files changed

third_party/xla/docs/developer_guide.md

2024-11-14T21:31:59 See commit

This commit introduces a new test case within the XLA (Accelerated Linear Algebra) GPU framework, specifically targeting the functionality of dot products with batch and contracting dimensions. The changes made involve the addition of a support matrix that evaluates the performance of different backends (like Triton and BLAS) for various matrix dimensions. The test is designed to validate the backend's ability to handle batched dot products and includes the generation of HLO (High-Level Operations) modules based on configurable parameters.

The modifications include the restructuring of existing test functions to accommodate the new testing requirements, along with enhancements in how module names and parameters are generated. The commit also improves logging for debugging purposes, allowing for better tracking of the test outcomes. Overall, this update aims to bolster the testing framework for dot product operations in XLA, ensuring comprehensive support for various dimensional configurations and backend implementations.

Files changed

third_party/xla/xla/service/gpu/fusions/triton/dot_algorithms_test.cc

2024-11-14T23:13:43 See commit

This commit focuses on migrating the PushTransposeThroughEwisePass to the new TensorFlow Lite (TFL) pass mechanism, enhancing the organization and maintainability of the code. The transition involved the removal of the .td definition for this pass, which simplifies the structure by consolidating the pass's implementation into C++ files. The commit includes modifications to various files, such as updating the build configurations, renaming the pass to align with the new mechanism, and ensuring that the pass is properly registered within the TFL framework.

Additionally, the commit updates test files to reflect the new command-line argument for the pass, changing it from --push-transpose-through-ewise to --tfl-push-transpose-through-ewise. This change not only improves consistency but also aligns with the overall architecture of the MLIR (Multi-Level Intermediate Representation) framework used in TensorFlow. The modifications made in this commit contribute to a more streamlined and efficient way of handling transpose operations through element-wise operations in TensorFlow Lite.

Files changed

tensorflow/compiler/mlir/lite/BUILD
tensorflow/compiler/mlir/lite/tests/push-tpose-through-ewise.mlir
tensorflow/compiler/mlir/lite/transforms/passes.h
tensorflow/compiler/mlir/lite/transforms/passes.td
tensorflow/compiler/mlir/lite/transforms/push_transpose_through_ewise_pass.cc
tensorflow/compiler/mlir/lite/transforms/push_transpose_through_ewise_pass.h

2024-11-14T23:34:06 See commit

This commit integrates updates from LLVM's repository, specifically aligning with the changes made in commit 03730cdd3d10. The modifications primarily affect several files related to the LLVM integration within the project, including workspace.bzl files in both the third_party/llvm and third_party/shardy directories, as well as a temporary patch file in the third_party/shardy directory.

The updates are aimed at ensuring compatibility and leveraging the latest features or improvements from LLVM, reflecting an ongoing effort to maintain up-to-date dependencies within the project. The commit is identified by PiperOrigin-RevId: 696672457.

Files changed

third_party/llvm/workspace.bzl
third_party/shardy/temporary.patch
third_party/shardy/workspace.bzl

2024-11-14T23:50:39 See commit

This commit introduces the PJRT_Buffer_CopyRawToHost function to the PJRT C API, enhancing its capabilities for buffer management. The new function allows users to copy raw data from device memory to host memory, facilitating better data handling in GPU applications. The implementation includes a new structure, PJRT_Buffer_CopyRawToHost_Args, which defines the parameters needed for the copy operation, such as the buffer to copy from, the destination address, offset, and transfer size. Additionally, the commit modifies various files to integrate this new functionality, including updates to the API structure, test cases, and relevant documentation.

In the testing framework, new test cases have been added to validate the functionality of PJRT_Buffer_CopyRawToHost, ensuring that it handles both valid and invalid parameters appropriately. The tests check for successful data transfer and proper error handling when incorrect offsets are provided. Overall, this commit represents a significant enhancement to the PJRT C API, contributing to improved data transfer capabilities between device and host, which is crucial for performance in high-performance computing and machine learning applications.

Files changed

third_party/xla/xla/pjrt/BUILD
third_party/xla/xla/pjrt/c/BUILD
third_party/xla/xla/pjrt/c/CHANGELOG.md
third_party/xla/xla/pjrt/c/pjrt_c_api.h
third_party/xla/xla/pjrt/c/pjrt_c_api_gpu_test.cc
third_party/xla/xla/pjrt/c/pjrt_c_api_wrapper_impl.cc
third_party/xla/xla/pjrt/c/pjrt_c_api_wrapper_impl.h
third_party/xla/xla/pjrt/pjrt_c_api_client.cc
third_party/xla/xla/pjrt/pjrt_c_api_client.h

2024-11-15T01:48:02 See commit

This commit addresses an issue in TensorFlow Lite's implementation where models without signatures were incorrectly handled. The change involves modifying the function call in the MLIR representation of a simple model, specifically altering the custom_option field to pass an empty string instead of "simple". This adjustment ensures that when models lack a function signature, a nullptr is passed to the SB API rather than an empty string, which prevents the API from expecting a model with a signature.

Additionally, the commit updates the AssignNodeFunction method in the dispatch API to include logic that checks if the function_name is empty. If it is, the code now passes a nullptr instead of the function name, aligning with the requirement that models without signatures should not provide a function name. This change enhances the robustness of the dispatching mechanism by ensuring it correctly handles models lacking function signatures, as noted in the related bug report (b/378913220).

Files changed

tensorflow/lite/experimental/litert/test/testdata/simple_model_npu.mlir
tensorflow/lite/experimental/litert/vendors/google_tensor/dispatch/dispatch_api.cc

2024-11-15T03:40:23 See commit

This commit introduces two new interfaces, HloModuleInterface and HloInstructionInterface, to enhance the structure and interaction with HLO (High-Level Optimizer) modules and instructions in TensorFlow. The HloInstructionWrapper and HloModuleWrapper classes are implemented to conform to these interfaces, providing a more organized and extensible way to manage HLO data. The interfaces define essential methods for retrieving various properties of HLO instructions and modules, such as their names, op codes, and performance metrics like FLOPs and bytes accessed.

In addition to the interface implementations, the commit includes significant modifications to the HloModuleWrapper class, enhancing its functionality to gather fusion instructions and manage nested computations more effectively. The changes also improve the organization of the code by introducing helper functions and ensuring that the wrappers cache results for efficient access. Overall, these updates aim to streamline the handling of HLO data structures within TensorFlow's profiling utilities, ultimately contributing to more efficient computation and analysis.

Files changed

tensorflow/core/profiler/utils/BUILD
tensorflow/core/profiler/utils/hlo_module_map.cc
tensorflow/core/profiler/utils/hlo_module_map.h

2024-11-15T04:29:37 See commit

The commit associated with PR #19112 enhances the GPU GEMM fusion capabilities within the XLA framework by adding support for broadcasts involving trivially-sized dimensions, specifically those of size one. This improvement allows operations where tensors of shape [1,n] can be broadcasted to shapes like [1,m,n], thereby increasing the flexibility and efficiency of matrix operations in GPU computations. The changes include modifications to the Triton GEMM emitter and analysis components, ensuring that the new broadcasting capabilities are correctly integrated into the existing functionality.

In addition to the core enhancements, the commit also addresses feedback from previous iterations and includes new tests to validate the implementation of these broadcasting features. The tests specifically check for both contracting and non-contracting dimensions, ensuring that the system can handle a variety of tensor operations without errors. Overall, this update aims to optimize performance in GPU-based computations by leveraging advanced broadcasting techniques in GEMM fusion.

Files changed

third_party/xla/xla/service/gpu/BUILD
third_party/xla/xla/service/gpu/fusions/triton/triton_fusion_emitter_device_legacy_test.cc
third_party/xla/xla/service/gpu/fusions/triton/triton_fusion_emitter_legacy_matmul.cc
third_party/xla/xla/service/gpu/triton_fusion_analysis_test.cc
third_party/xla/xla/service/gpu/triton_tiling_propagation.cc

2024-11-15T05:09:13 See commit

This commit addresses an issue in the AlgebraicSimplifier component of the XLA (Accelerated Linear Algebra) library, ensuring that it does not eliminate copies related to host offloading. The changes include modifications to the algebraic_simplifier.cc file, where new checks were added to prevent the simplifier from processing copies that involve synchronous transfers to or from the host. This is crucial for maintaining the integrity of computations that rely on host memory operations.

In addition to the main code changes, the commit also updates the host_offload_utils.cc file to refine the logic for identifying synchronous copies involving host memory. A new test case was added to verify that the simplifier correctly retains these host offloading copies during its operations. Overall, this commit enhances the functionality of the AlgebraicSimplifier by ensuring that it respects the requirements of host memory operations in the XLA framework.

Files changed

third_party/xla/xla/hlo/transforms/BUILD
third_party/xla/xla/hlo/transforms/simplifiers/algebraic_simplifier.cc
third_party/xla/xla/hlo/transforms/simplifiers/algebraic_simplifier_test.cc
third_party/xla/xla/service/host_offload_utils.cc