tensorflow changelog

7 months ago

Welcome to the latest updates! We've been busy adding some shiny new features and fixing pesky bugs to make your experience smoother and more efficient. Here's a rundown of what's new and improved:

New Feature 🚀: Parallel compilation is now live for the XLA CPU backend, thanks to our new ORC TaskDispatcher. This means faster and more efficient JIT compilation, leveraging multi-threading to get things done in a snap!
New Feature 🎉: TensorV1Attr support has been added to the flatbuffer_export and flatbuffer_operator, allowing for a more structured and efficient data representation in TensorFlow's MLIR framework. Now you can handle tensor attributes like a pro!
New Feature 🌟: Introducing the VIFRT pass for converting between VIFRT versions. This nifty addition ensures compatibility and flexibility across different versions, making your development process smoother than ever.
New Feature 🐍: Python bindings for VIFRT serialization are here! Now you can serialize and deserialize IFRT IR programs with ease, ensuring compatibility across versions and making advanced serialization features more accessible.
New Feature 🔧: Say hello to the experimental C++ graph builder for TensorFlow Lite! This tool empowers developers to construct and manipulate machine learning models programmatically, enhancing TFLite's flexibility and usability.
Improvement 🛠️: We've migrated the CpuCompiler from SimpleOrcJit to JitCompiler in the XLA backend for CPU. This upgrade promises better optimization and execution speeds, keeping things running like a well-oiled machine.
Improvement ⚙️: To prep for JIT compilation, we've enhanced the CpuCompiler by constructing the JitCompiler within it, setting the stage for more efficient compilation processes.
New Feature 💡: A sharding config has been added to XLA's HloModuleConfig, as part of the AutoFDO integration. This gives you better control over operation distribution, optimizing performance like never before.
Bugfix 🐛: We've squashed a bug in the MoveUserInstructionsIn function that was causing compilation errors with conditional operations. Now it handles multiple users like a champ!
Bugfix 🐞: Fixed an async execution bug in transposed convolution operations for XLA CPU. The intermediate buffer now stays in scope, preventing any memory mishaps.
Bugfix 🔧: The tune_ctas logic in GemmFusionAutotunerImpl has been restored, ensuring proper CTA tuning for GPU computations, especially on Hopper architectures.
Chore 🔍: Updated internal visibility settings for the registry library, ensuring access is managed effectively for Google-specific clients.

These updates are all about making your experience smoother, faster, and more powerful. Enjoy the new features and improvements, and keep an eye out for more exciting updates coming your way! 🎈

Included Commits

2024-11-22T21:32:01 See commit

This commit introduces support for the TensorV1Attr in the flatbuffer_export and flatbuffer_operator components of TensorFlow's MLIR (Multi-Level Intermediate Representation) framework. The new encoding format for TensorV1Attr is defined, which includes the tensor's shape, type, and data as structured information. The implementation modifies the relevant source files to handle the new tensor attribute type, including functions to build and manipulate these attributes, ensuring that they can be correctly serialized and deserialized as part of the FlatBuffer format.

In addition to the encoding support, the commit also includes updates to handle the conversion of tensor data types, specifically catering to different tensor element types such as INT32, FLOAT32, and INT64. The changes ensure that the TensorV1Attr can seamlessly integrate with existing functionalities, allowing for efficient data representation and manipulation within the MLIR framework. The commit also updates test cases to reflect these changes, ensuring that the new functionality is validated and functioning as expected.

Files changed

tensorflow/compiler/mlir/lite/flatbuffer_export.cc
tensorflow/compiler/mlir/lite/flatbuffer_operator.cc
tensorflow/compiler/mlir/lite/tests/flatbuffer2mlir/composite_op_round_trip.mlir

2024-11-22T21:54:12 See commit

This commit introduces a new pass called VifrtToVersionPass, which facilitates the conversion of VIFRT (Versioned Intermediate Representation for TensorFlow) modules between different versions. The pass is designed to validate the target version specified and ensure compatibility with the current version, allowing for the transformation of VIFRT modules to a specified target version. The implementation includes validation checks for the target version format and compatibility, ensuring that the conversion process adheres to version constraints.

Additionally, the commit modifies several files to integrate this new pass, including updates to the build configuration, header files, and the core implementation of the conversion logic. The new pass is expected to enhance the flexibility and usability of VIFRT by enabling developers to work with various versions of the VIFRT module, thereby improving the overall development experience within the XLA (Accelerated Linear Algebra) framework.

Files changed

third_party/xla/xla/python/ifrt/ir/transforms/BUILD
third_party/xla/xla/python/ifrt/ir/transforms/passes.cc
third_party/xla/xla/python/ifrt/ir/transforms/passes.h
third_party/xla/xla/python/ifrt/ir/transforms/passes.td
third_party/xla/xla/python/ifrt/ir/transforms/vifrt_to_version_pass.cc

2024-11-23T03:03:18 See commit

The commit addresses limitations in the MoveUserInstructionsIn function within the XLA (Accelerated Linear Algebra) service, specifically regarding its inability to handle conditional operations that produce array outputs when multiple users are involved. This limitation can lead to compilation errors, as demonstrated by a newly added test case that highlights the issue. The function now includes a check to return false if the conditional operation does not produce a tuple and has more than one user, thereby preventing potential errors during compilation.

Additionally, the commit modifies the conditional_code_motion_test.cc file by adding a test case that illustrates the failure scenario when conditional operations with array outputs are used by multiple users. The test case creates a high-level operation (HLO) module that triggers the problematic conditions, confirming that the function correctly identifies and handles this scenario by returning false. Overall, these changes enhance the robustness of the conditional code motion optimization process in the XLA service.

Files changed

third_party/xla/xla/service/conditional_code_motion.cc
third_party/xla/xla/service/conditional_code_motion_test.cc

2024-11-23T03:27:12 See commit

This commit introduces a change to the visibility settings of the registry library within the ifrt_proxy client in the XLA (Accelerated Linear Algebra) module. Specifically, it modifies the BUILD file to include a new visibility rule that combines the default visibility for the ifrt_proxy with an additional condition that allows access to Google-specific clients, as specified in the if_google function.

The update enhances the access control for the registry library, ensuring that only designated Google clients can utilize it, while still maintaining the default visibility settings for other users. This change is part of an effort to manage dependencies and access within the codebase more effectively.

Files changed

third_party/xla/xla/python/ifrt_proxy/client/BUILD

2024-11-25T22:57:01 See commit

This commit introduces Python bindings for the serialization of the VIFRT (Virtual Intermediate Representation Framework) within the XLA (Accelerated Linear Algebra) framework. It modifies several files to integrate these bindings, including the addition of a new source file, ir_py.cc, which defines functions for serializing and deserializing versioned IFRT IR programs. The update also includes adjustments to existing files to accommodate the new serialization functionality, such as incorporating necessary dependencies and modifying class structures to support the new features.

The Python bindings enable users to serialize IFRT IR programs into a stable versioned format and deserialize them back into their original representation. This functionality is critical for ensuring compatibility across different versions of the framework. The commit enhances the usability of the XLA library by allowing developers to interact with IFRT IR programs more easily through Python, thus broadening the accessibility of advanced serialization features for machine learning and computational tasks.

Files changed

third_party/xla/xla/python/ifrt/ir/BUILD
third_party/xla/xla/python/ifrt/ir/ifrt_ir_program.h
third_party/xla/xla/python/ifrt/ir/ifrt_ir_program_serdes.cc
third_party/xla/xla/python/ifrt/ir/ir_py.cc

2024-11-26T01:35:34 See commit

This commit introduces a sharding configuration to the XLA's HloModuleConfig as part of the integration of AutoFDO (Automatic Feedback-Directed Optimization). The changes encompass modifications to several files, including the addition of new C++ libraries for handling sharding, specifically hlo_sharding and hlo_op_metadata. The updated HloModuleConfig now includes methods for accessing and modifying the sharding configuration, allowing for better control over how operations are distributed across nodes.

In addition to the new sharding-related code, the commit also updates the serialization format for HloModuleConfig to include the sharding configuration in the associated protocol buffer message. This ensures that the sharding setup can be effectively saved and restored, facilitating the use of sharding in optimizations during the compilation process. Overall, these enhancements aim to improve the efficiency of XLA by leveraging sharding strategies in conjunction with AutoFDO.

Files changed

third_party/xla/xla/hlo/ir/BUILD
third_party/xla/xla/service/BUILD
third_party/xla/xla/service/hlo_module_config.cc
third_party/xla/xla/service/hlo_module_config.h
third_party/xla/xla/xla.proto

2024-11-26T18:01:58 See commit

This commit addresses a bug in the asynchronous execution of transposed convolution operations within the XLA (Accelerated Linear Algebra) CPU backend. The issue arose because the intermediate buffer used during the computation was going out of scope by the time the callback was executed, leading to attempts to access released memory. To resolve this, the ownership of the intermediate buffer is now transferred to a lambda function, ensuring that its lifetime is extended appropriately and preventing any memory access violations.

The changes made involve modifying the lambda function to capture the intermediate buffer by moving it, which allows the buffer's data pointer to remain valid throughout the execution of the callback. This adjustment is crucial as it ensures that the data referenced by the output matrix remains intact when the computation is finalized. The commit includes updates to the relevant header file, with a total of 17 changes, including 13 additions and 4 deletions, to implement this fix.

Files changed

third_party/xla/xla/backends/cpu/runtime/convolution_thunk_internal.h

2024-11-26T19:15:05 See commit

This commit introduces an experimental programmatic C++ graph builder for TensorFlow Lite (TFLite). The changes include modifications to the build configuration file and the interpreter header, as well as the addition of new files that implement the graph building functionality. Specifically, new source files for model building and corresponding tests have been added to enhance the TFLite framework.

The newly added files, model_building.cc, model_building.h, and model_building_test.cc, are designed to facilitate the creation and testing of models programmatically, improving the flexibility and usability of TFLite for developers. This commit reflects ongoing efforts to enhance the capabilities of TensorFlow Lite, making it easier to construct and manipulate machine learning models in a C++ environment.

Files changed

tensorflow/lite/core/BUILD
tensorflow/lite/core/interpreter.h
tensorflow/lite/core/model_building.cc
tensorflow/lite/core/model_building.h
tensorflow/lite/core/model_building_test.cc

2024-11-27T23:28:31 See commit

This commit addresses a regression in the GemmFusionAutotunerImpl::GetExhaustiveTritonConfigs function, which was inadvertently introduced in a previous pull request. The primary change involves restoring the logic for the tune_ctas variable, ensuring that it correctly determines whether to enable tuning for Cooperative Thread Arrays (CTAs) based on the GPU's compute capability and specific debug options. The updated logic now checks if the system is not using ROCm and if the GPU is at least a Hopper architecture before setting tune_ctas to true.

Additionally, the commit includes modifications to the autotuning tests to verify the functionality of the CTA tuning behavior on Hopper GPUs. New test cases have been added to ensure that the autotuner correctly identifies configurations with more than two CTAs when the appropriate debug options are enabled. Overall, these changes enhance the robustness of the autotuning process for GPU computations in the XLA framework.

Files changed

third_party/xla/xla/service/gpu/autotuning/BUILD
third_party/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc
third_party/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner_test.cc

2024-11-28T18:20:08 See commit

This commit introduces parallel compilation capabilities to the XLA CPU backend by implementing the ORC TaskDispatcher. Key modifications include enhancements to the JitCompiler class, which now accepts a TaskRunner for executing compilation tasks concurrently. The changes involve the addition of new dependencies and the implementation of a TaskDispatcher that manages task execution, allowing for more efficient symbol lookup and module compilation. The new structure ensures that tasks can be dispatched either in the current thread or using a user-defined task runner, which can leverage a thread pool for better performance.

Additionally, the commit updates the testing framework to validate the new parallel compilation feature. Tests have been adjusted to confirm that multiple tasks can be scheduled and executed concurrently, ensuring that the JIT compilation process benefits from the added parallelism. Overall, this enhancement aims to improve the performance of the JIT compilation process in the XLA CPU backend by utilizing multi-threading capabilities.

Files changed

third_party/xla/xla/backends/cpu/codegen/BUILD
third_party/xla/xla/backends/cpu/codegen/jit_compiler.cc
third_party/xla/xla/backends/cpu/codegen/jit_compiler.h
third_party/xla/xla/backends/cpu/codegen/jit_compiler_test.cc

2024-11-28T19:07:40 See commit

This commit focuses on enhancing the CpuCompiler within the XLA (Accelerated Linear Algebra) framework by incorporating the JitCompiler, which is designed to facilitate Just-In-Time (JIT) compilation. The modifications primarily involve the creation of a task runner for the JitCompiler, which utilizes a global compilation thread pool to manage tasks effectively. Additionally, the commit introduces options for compiling LLVM IR (Intermediate Representation) into machine code, including settings for optimization levels, size optimization, and fast math flags. The JitCompiler is instantiated with these options, paving the way for more efficient and flexible compilation processes within the CpuCompiler.

Overall, this update is a preparatory step towards integrating JIT compilation capabilities into the XLA CPU compilation workflow, thereby improving performance and enabling more complex optimizations during runtime. The changes involve a significant addition of code, reflecting the complexity and importance of the JIT compilation process in the overall architecture of the XLA framework.

Files changed

third_party/xla/xla/service/cpu/cpu_compiler.cc

2024-11-29T03:24:33 See commit

This commit focuses on migrating the CpuCompiler from the existing SimpleOrcJit framework to the more advanced JitCompiler within the XLA (Accelerated Linear Algebra) backend for CPU. The changes involve modifications across multiple files, including the cpu_compiler, cpu_executable, and runtime_symbol_generator components, as well as updates to related build configurations and tests.

The migration aims to enhance the performance and efficiency of the CPU compilation process by leveraging the capabilities of the JitCompiler, which is expected to yield better optimization and execution speeds. The commit reflects a significant refactor of the codebase, ensuring that the CPU backend is better aligned with the latest advancements in JIT compilation technology.

Files changed

third_party/xla/xla/backends/cpu/codegen/BUILD
third_party/xla/xla/backends/cpu/codegen/jit_compiler.cc
third_party/xla/xla/backends/cpu/codegen/jit_compiler.h
third_party/xla/xla/service/cpu/BUILD
third_party/xla/xla/service/cpu/cpu_compiler.cc
third_party/xla/xla/service/cpu/cpu_executable.cc
third_party/xla/xla/service/cpu/cpu_executable.h
third_party/xla/xla/service/cpu/runtime_symbol_generator.cc
third_party/xla/xla/service/cpu/runtime_symbol_generator.h
third_party/xla/xla/service/cpu/simple_orc_jit.cc
third_party/xla/xla/tests/local_client_execute_test.cc