tensorflow changelog

11 months ago

Here's a delightful rundown of the latest and greatest changes, improvements, and fixes in our codebase. We've been busy integrating, optimizing, and squashing pesky bugs to make your experience smoother and more efficient. Let's dive into the details! 🚀

New feature: We've integrated the StableHLO framework into TensorFlow's MLIR infrastructure. This major update focuses on transforming and legalizing quantization and HLO operations, enhancing compatibility and performance. 🎉
New feature: Added support for unary element-wise operations in the MHLO to TFL conversion process. Now, operations like absolute value and trigonometric functions are seamlessly transformed, bolstering TensorFlow Lite's capabilities. 🌟
Improvement: Exporting MLIR modules just got clearer! The name of the HLO module now matches the MLIR module name, ditching the default "main" to avoid confusion and conflicts. 📛
New feature: Memory management in XLA is stepping up! We've laid the groundwork for adding memory spaces to the CompileOnlyClient, paving the way for more sophisticated memory handling. 🧠
Improvement: FP8 windowed einsums with multiple all-gather dots are now supported. This enhancement optimizes FP8 operations within the XLA framework, thanks to a nifty shift in dequantization. 🎯
Improvement: Casting operations between floats and integers in MLIR are now more efficient, thanks to new folding optimizations. Say hello to faster compilation! 🔄
New feature: Introducing GetSparseCoreId to the TensorFlow profiler! This function extracts Sparse Core IDs from plane names, boosting TPU profiling capabilities. 🕵️‍♂️
New feature: We've added a pass to open the sharding of while op free variables. This helps optimize sharding strategies during HLO conversion, enhancing operation efficiency. 🧩
Bugfix: Resolved an issue where "MakeExactCopy" didn't copy "known_graph_outputs_", ensuring all necessary output values are retained in copied graphs. 🐛
Bugfix: Fixed integer overflow issues post-NumPy 2.0 update by refining type casting and array creation operations, maintaining compatibility with NumPy 1.x behavior. 🔧
Chore: Cleaned up pywrap_parallel_device.cc by removing unnecessary TensorFlow C API headers, streamlining the codebase. 🧹
Bugfix: Addressed test failures under NumPy 2.x by directly calling __array__() for objects requiring a copy when converting to TF tensors. Compatibility restored! 🛠️

These updates are all about making things run smoother, faster, and with fewer hiccups. Keep those updates coming, and happy coding! 😊

Included Commits

2024-08-09T00:54:06 See commit

This commit introduces a new function, GetSparseCoreId, to the TensorFlow profiler utilities, which extracts the Sparse Core ID from a given plane name if it matches the defined Sparse Core naming pattern. The implementation includes modifications to several files, such as xplane_schema.h, where a new regex constant for Sparse Core plane names is added, and tpu_xplane_utils.cc, where the new function is defined. Additionally, the header file tpu_xplane_utils.h is updated to declare the new function.

The commit also updates the test file tpu_xplane_utils_test.cc to include tests that verify the functionality of the new GetSparseCoreId function. These tests check that the function correctly identifies Sparse Core planes and retrieves their IDs, ensuring the robustness of the new feature within the profiling utilities. Overall, this enhancement aims to improve the profiling capabilities related to TPU Sparse Core planes in TensorFlow.

Files changed

tensorflow/core/profiler/utils/xplane_schema.h
third_party/xla/third_party/tsl/tsl/profiler/utils/BUILD
third_party/xla/third_party/tsl/tsl/profiler/utils/tpu_xplane_utils.cc
third_party/xla/third_party/tsl/tsl/profiler/utils/tpu_xplane_utils.h
third_party/xla/third_party/tsl/tsl/profiler/utils/tpu_xplane_utils_test.cc
third_party/xla/third_party/tsl/tsl/profiler/utils/xplane_schema.cc
third_party/xla/third_party/tsl/tsl/profiler/utils/xplane_schema.h

2024-08-09T02:28:35 See commit

This commit introduces support for unary element-wise operations in the conversion process from MHLO (Multi-Level HLO) to TFL (TensorFlow Lite). The changes include the addition of various unary operations such as absolute value, ceiling, floor, logarithm, and trigonometric functions, among others. The implementation ensures that these operations are correctly transformed into their TFL equivalents, enhancing the functionality and compatibility of the MLIR (Multi-Level Intermediate Representation) framework with TensorFlow Lite.

The modifications involve defining new functions and patterns in the MLIR codebase, which facilitate the legal conversion of unary ops by adding them to the conversion target. The commit also includes tests to validate that the conversions are functioning as expected. Overall, this enhancement expands the range of operations that can be utilized within TensorFlow Lite, potentially improving the performance and efficiency of machine learning models deployed on edge devices.

Files changed

tensorflow/compiler/mlir/lite/stablehlo/tests/tfl_legalize_hlo.mlir
tensorflow/compiler/mlir/lite/stablehlo/transforms/tflite_legalize_hlo.cc
tensorflow/compiler/mlir/lite/stablehlo/transforms/tflite_legalize_hlo_patterns.td

2024-08-09T04:48:36 See commit

The recent commit involves modifications to the pywrap_parallel_device.cc file within TensorFlow's Python distribution. The changes include the removal of six lines of code, specifically related to various TensorFlow C API headers that were deemed unnecessary. This cleanup likely aims to streamline the codebase by eliminating unused imports, which can enhance maintainability and reduce potential confusion for developers working on this part of the TensorFlow framework.

Overall, the commit reflects an effort to refine the code by removing redundant dependencies, specifically those associated with the TensorFlow C API, while retaining essential includes necessary for the functionality of parallel device operations. This kind of automated code change is part of ongoing maintenance practices to ensure the code remains efficient and relevant to current development needs.

Files changed

tensorflow/python/distribute/parallel_device/pywrap_parallel_device.cc

2024-08-09T18:09:58 See commit

This commit introduces a significant enhancement to the export functionality of the MLIR (Multi-Level Intermediate Representation) module by ensuring that the name of the HLO (High-Level Operations) module corresponds to the name of the MLIR module during the export process. Previously, the exported module name was defaulted to "main," which could lead to confusion or conflicts in cases where multiple modules were involved. The change allows for better identification and management of modules by preserving their original names, improving the clarity of the exported code.

The modifications include updates to several files, specifically in the translation layer where MLIR is converted to HLO. The code now retrieves the name of the MLIR module and sets it as the name for the HLO module, thereby enhancing the functionality of the translation tool. Furthermore, the commit includes updates to the test files to validate this new behavior, ensuring that the export process correctly reflects the intended module naming. Overall, this change improves the usability and functionality of the translation tools within the XLA (Accelerated Linear Algebra) framework.

Files changed

third_party/xla/xla/translate/mhlo_to_hlo/BUILD
third_party/xla/xla/translate/mhlo_to_hlo/tests/export.mlir
third_party/xla/xla/translate/mhlo_to_hlo/translate.cc

2024-08-10T01:47:55 See commit

This commit introduces a foundational implementation for adding memory spaces to the CompileOnlyClient in the XLA (Accelerated Linear Algebra) library. Key changes include the addition of a new class, PjRtMemorySpaceDescription, which encapsulates details about different memory spaces, such as a unique identifier and a descriptive string. The PjRtDeviceDescription class has been updated to include methods for retrieving all memory spaces associated with a device and for fetching the default memory space, although the latter method is currently unimplemented.

Furthermore, enhancements have been made to the CompileOnlyDevice and CompileOnlyMemory classes to handle memory management more effectively. The CompileOnlyDevice can now store and manage multiple memory instances, including setting a default memory space. The CompileOnlyIfRtClient class has also been updated to initialize memory spaces for devices based on their descriptions, ensuring that memory is properly allocated and associated with the respective devices. Overall, this commit lays the groundwork for more sophisticated memory handling in the XLA compilation process.

Files changed

third_party/xla/xla/pjrt/BUILD
third_party/xla/xla/pjrt/pjrt_device_description.h
third_party/xla/xla/python/py_compile_only_client.cc

2024-08-12T05:53:21 See commit

This commit addresses an issue in the TensorFlow Lite GPU delegate where the "MakeExactCopy" function did not properly copy the "known_graph_outputs_" values from one graph to another. These values are important as they represent outputs that have consumers and should be retained when creating a copy of the graph. Previously, if the "outputs()" function was called on a copied graph, it would result in missing these crucial values.

To resolve this, the commit modifies the "MakeExactCopy" function to ensure that the "known_graph_outputs_" are cleared and appropriately populated in the copied graph. The changes involve checking if each value in the original graph is part of the "known_graph_outputs_" and, if so, adding it to the copied graph's corresponding list. This ensures that the copied graph maintains its integrity and functionality by preserving all necessary output values.

Files changed

tensorflow/lite/delegates/gpu/common/model.cc

2024-08-12T23:08:46 See commit

This commit introduces a new pass designed to enhance the handling of free variables within a while operation that has a user-defined sharding. By applying a fully open sharding constraint to each free variable, the pass enables further sharding of these variables during their usage in the while operation. This enhancement is particularly significant for the conversion process to HLO (High-Level Operations), as it allows these variables to be treated as passthrough operands and results, thereby improving the overall efficiency and flexibility of the operation.

In terms of implementation, several files have been modified and new files added to support this functionality. Key additions include the implementation of the open sharding pass in open_while_free_vars_sharding.cc and its corresponding header file, along with updates to various build files and test cases to ensure proper integration and validation of the new feature. Overall, this commit represents a step forward in optimizing sharding strategies within the XLA framework.

Files changed

third_party/xla/xla/service/spmd/shardy/BUILD
third_party/xla/xla/service/spmd/shardy/round_trip_common/BUILD
third_party/xla/xla/service/spmd/shardy/round_trip_common/open_while_free_vars_sharding.cc
third_party/xla/xla/service/spmd/shardy/round_trip_common/open_while_free_vars_sharding.h
third_party/xla/xla/service/spmd/shardy/round_trip_common/pipeline_passes.cc
third_party/xla/xla/service/spmd/shardy/sdy_opt_main.cc
third_party/xla/xla/service/spmd/shardy/shardy_xla_pass_test.cc
third_party/xla/xla/service/spmd/shardy/test/mhlo_import_pipeline.mlir
third_party/xla/xla/service/spmd/shardy/test/open_while_free_vars_sharding.mlir
third_party/xla/xla/service/spmd/shardy/test/round_trip_pipeline.mlir
third_party/xla/xla/service/spmd/shardy/test/sdy_round_trip_import_pipeline.mlir

2024-08-13T01:52:57 See commit

This commit addresses test failures encountered when using TensorFlow (TF) with NumPy 2.x, specifically related to objects that implement the __array__ method and require a copy during conversion to a TF tensor. The previous implementation utilized PyArray_FromArrayAttr(), which in NumPy 2.x raises an error if a copy is necessary. To maintain the intended functionality, the code now checks for the presence of the __array__ method on the object and directly calls it if available, ensuring the conversion to a NumPy array proceeds without error.

The changes involve modifying the handling of objects that expose the __array__ method, replacing the previous approach with a more direct method call. This adjustment not only resolves the compatibility issues with NumPy 2.x but also preserves the existing behavior of the tensor conversion process. Additional error handling is included to ensure that the result of the __array__ method is indeed a NumPy array, thereby safeguarding against potential misuses of the interface.

Files changed

tensorflow/python/lib/core/py_seq_tensor.cc

2024-08-14T05:54:31 See commit

The commit associated with PR #13808 introduces support for FP8 windowed einsums that utilize all-gather operations with multiple dot products. This enhancement involves shifting the dequantization of FP8 operands to the output of the while loop, optimizing the processing of these operations in the XLA (Accelerated Linear Algebra) framework.

The changes were made by Philipp Hack and include modifications to several files, such as the windowed_einsum_handler.cc and its corresponding test files, ensuring that the new functionality is properly integrated and tested. The successful merging of this commit resolves the associated pull request, marking a significant improvement in handling FP8 operations within the GPU transformations of the XLA service.

Files changed

third_party/xla/xla/service/gpu/transforms/BUILD
third_party/xla/xla/service/gpu/transforms/windowed_einsum_handler.cc
third_party/xla/xla/service/gpu/transforms/windowed_einsum_handler_test.cc
third_party/xla/xla/tests/BUILD
third_party/xla/xla/tests/collective_ops_e2e_test.cc

2024-08-15T22:51:22 See commit

This commit integrates the StableHLO framework into the TensorFlow MLIR (Multi-Level Intermediate Representation) infrastructure, specifically focusing on various transformation passes and legalizations concerning quantization and HLO (High-Level Operations). It modifies several files across different directories, including those responsible for composing uniform quantized types, optimizing quantization, and legalizing TensorFlow operations to HLO and vice versa. The changes also involve updates to transformation patterns and the introduction of new functionalities to streamline the integration of StableHLO.

The commit includes modifications to both the core TensorFlow MLIR components and third-party XLA (Accelerated Linear Algebra) libraries, ensuring compatibility and enhancing the functionality of the HLO legalizations. Notably, it addresses specific operations like dot and einsum, adapting them to align with the StableHLO specifications. This integration aims to improve the efficiency and performance of MLIR-based models by leveraging the advantages offered by StableHLO in terms of quantization and optimization.

Files changed

tensorflow/compiler/mlir/lite/stablehlo/transforms/compose_uniform_quantized_type_pass.cc
tensorflow/compiler/mlir/lite/stablehlo/transforms/legalize_hlo_patterns.td
tensorflow/compiler/mlir/lite/stablehlo/transforms/optimize.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/lift_quantizable_spots_as_functions_fusion.td
tensorflow/compiler/mlir/quantization/stablehlo/passes/lift_quantizable_spots_as_functions_simple.td
tensorflow/compiler/mlir/tf2xla/transforms/legalize_tf.cc
tensorflow/compiler/mlir/tf2xla/transforms/legalize_tf_patterns.td
third_party/stablehlo/temporary.patch
third_party/stablehlo/workspace.bzl
third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops.cc
third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops.td
third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops_attrs.td
third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops_enums.td
third_party/xla/xla/mlir_hlo/mhlo/transforms/hlo_legalize_to_stablehlo/hlo_legalize_to_stablehlo.cc
third_party/xla/xla/mlir_hlo/mhlo/transforms/legalize_dot_to_dot_general/legalize_dot_to_dot_general.cc
third_party/xla/xla/mlir_hlo/mhlo/transforms/legalize_einsum_to_dot_general/legalize_einsum_to_dot_general.cc
third_party/xla/xla/mlir_hlo/mhlo/transforms/stablehlo_legalize_to_hlo/stablehlo_legalize_to_hlo.cc
third_party/xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-legalize-to-stablehlo.mlir
third_party/xla/xla/mlir_hlo/tests/Dialect/mhlo/stablehlo-legalize-to-hlo.mlir
third_party/xla/xla/translate/mhlo_to_hlo/mlir_hlo_to_hlo.cc

2024-08-16T00:08:25 See commit

This commit addresses integer overflow issues that emerged following the update to NumPy 2.0, ensuring compatibility with the overflow behavior of NumPy 1.x. The primary modifications involve changing the way data types are handled during type casting and array creation. Specifically, the use of dtype(...) and np.array(..., dtype=...) has been replaced with np.array(...).astype(...) in several parts of the TensorFlow codebase to mitigate unintended overflow scenarios.

Key changes were made in files related to TensorFlow's client session and operations, including updates to functions that convert objects to NumPy types and the internal handling of ragged tensors. These adjustments enhance the robustness of the TensorFlow library when dealing with various data types, ensuring that users experience consistent behavior similar to that of the earlier NumPy version. Overall, this commit improves the reliability of TensorFlow's integration with NumPy, particularly in scenarios involving numerical data processing.

Files changed

tensorflow/python/client/session.py
tensorflow/python/ops/bitwise_ops_test.py
tensorflow/python/ops/ragged/ragged_factory_ops.py

2024-08-16T07:08:40 See commit

This commit introduces optimizations for casting operations in TensorFlow's MLIR (Multi-Level Intermediate Representation) by implementing folding for float-to-int and int-to-float conversions. Specifically, it enhances the CastOp class to include several new methods that facilitate the transformation of dense tensor attributes between integer and floating-point types. The changes ensure that when a cast operation is performed, it can be simplified to a more efficient representation if the input and output types are compatible, thereby improving performance during compilation.

The commit modifies the tfl_ops.cc file with 104 new lines of code and 31 deletions, resulting in a total of 135 changes. Additionally, it updates the test cases in const-fold.mlir to verify the correct functionality of the new casting operations, ensuring that the transformations yield the expected tensor outputs. This enhancement not only optimizes the compilation process but also expands the capabilities of the MLIR framework in handling numerical data conversions more efficiently.

Files changed

tensorflow/compiler/mlir/lite/ir/tfl_ops.cc
tensorflow/compiler/mlir/lite/tests/const-fold.mlir