tensorflow changelog

11 months ago

Welcome to the latest updates! We've packed in some awesome new features, crucial bug fixes, and a few handy improvements. Let's dive into what's new!

New Features 🚀

Integrate StableHLO at openxla/stablehlo@531816f0: We've integrated the StableHLO project from the OpenXLA repository. This update enhances the functionality and compatibility of the XLA framework with the StableHLO standard, improving the transformation of StableHLO to HLO operations and validating the conversion from CHLO to MHLO.
Graph Dumping in .pb Format: You can now dump TensorFlow graphs in both text and binary formats using the TF_DUMP_GRAPH_FMT environment variable. This feature adds flexibility and better integration options for users.
Command-Line Flags for MLIR Lite Tools: Introduced a new command-line flags library for TensorFlow MLIR Lite tools. This simplified and dependency-free module is perfect for benchmarks and easier command-line argument handling.
Shardy Partitioner in ExecutableOptions: Added a new boolean field use_shardy_partitioner in ExecutableOptions. This allows developers to opt for the Shardy partitioning strategy, enhancing flexibility in the XLA library.
UnfoldSplatConstantPass: Added the UnfoldSplatConstantPass to the MLIR framework before the HLO to TFLite legalization process. This pass prevents folding splat constants with broadcasts, which can cause bloated model sizes.

Bug Fixes 🐞

Reverted UniqueChannelIdEnforcer: Reverted a previous change that introduced the UniqueChannelIdEnforcer. This reflects a shift in strategy for managing unique channel IDs within the XLA framework.
Fix acos Decomposition: Corrected the decomposition of the acos function for non-complex arguments. The previous implementation incorrectly handled the case where x == -1, which should return π (pi).
AllReduceBlueConnect Crash Fix: Addressed a crash issue in AllReduceBlueConnect when multiple partitions are used. Now, the pass runs only with specific values for CollectiveOpGroupMode, improving robustness.

Improvements 🌟

Runtime Pointer Sizes for Sorting: Enhanced the XLA CPU backend to support runtime pointer sizes for sorting elements. This update improves flexibility and efficiency in sorting operations.
LLVM Integration: Updated the TensorFlow MLIR framework to align with the latest LLVM changes. This integration enhances performance and reliability in quantization and type conversion functionalities.
Automated Code Changes: Made extensive modifications to the TensorFlow DTensor MLIR framework, improving distributed processing capabilities and optimizing performance.

Chores 🧹

Remove Unused cuda_stream.h: Cleaned up the codebase by removing the unused cuda_stream.h header file and associated functions. This helps streamline the framework and improve maintainability.

That's all for now! Stay tuned for more updates and happy coding! 🎉

Included Commits

2024-07-19T17:43:52 See commit

This commit reverts a previous change identified by the hash 5c92d9f35258c44bdaa17184604c1a3af450fb5e, effectively removing the UniqueChannelIdEnforcer class and its associated test cases from the XLA (Accelerated Linear Algebra) service codebase. The changes include the deletion of the implementation file unique_channel_id_enforcer.cc, the header file unique_channel_id_enforcer.h, and the test file unique_channel_id_enforcer_test.cc, along with the removal of references to this enforcer in various build files.

The UniqueChannelIdEnforcer was designed to ensure that every collective operation within an HLO (High-Level Optimizer) module has a unique channel ID, which is critical for avoiding conflicts in parallel processing scenarios. The removal of this functionality suggests a shift in approach or strategy regarding channel ID management within the XLA framework. The commit reflects a significant alteration in the code structure, indicating a potential reevaluation of how the XLA service handles unique channel IDs.

Files changed

third_party/xla/xla/service/BUILD
third_party/xla/xla/service/gpu/BUILD
third_party/xla/xla/service/gpu/gpu_compiler.cc
third_party/xla/xla/service/unique_channel_id_enforcer.cc
third_party/xla/xla/service/unique_channel_id_enforcer.h
third_party/xla/xla/service/unique_channel_id_enforcer_test.cc

2024-07-19T22:24:23 See commit

This commit integrates the StableHLO project from the OpenXLA repository, specifically referencing the commit identified by the hash 531816f0. The integration involves modifications to several files within the third-party directory, including a temporary patch and changes to the workspace configuration.

Additionally, the commit updates a header file related to the transformation of StableHLO to HLO operations, as well as a test file that likely validates the legal conversion of certain constructs from the CHLO dialect to MHLO. These changes suggest an effort to enhance the functionality and compatibility of the XLA (Accelerated Linear Algebra) framework with the StableHLO standard.

Files changed

third_party/stablehlo/temporary.patch
third_party/stablehlo/workspace.bzl
third_party/xla/xla/mlir_hlo/mhlo/transforms/map_stablehlo_to_hlo_op.h
third_party/xla/xla/mlir_hlo/tests/Dialect/chlo/chlo_legalize_to_mhlo.mlir

2024-07-20T01:13:49 See commit

This commit addresses an issue in the decomposition of the arc cosine (acos) function for non-complex arguments in TensorFlow's MLIR (Multi-Level Intermediate Representation) code. The previous implementation incorrectly handled the case where the input value is -1, which should return π (pi). The updated code introduces a conditional structure that checks if the input is not equal to -1; if true, it computes acos using the formula ( acos(x) = 2 \cdot atan2(\sqrt{1 - x^2}, (1 + x)) ). If the input is -1, it directly returns pi.

In addition to the code changes, new tests have been added to validate the behavior of the acos function for both real and complex inputs, ensuring that the outputs are as expected. The modifications also include updates to the transformation patterns used in the StableHLO dialect to ensure accurate calculations and prevent errors related to floating-point precision. Overall, this commit enhances the mathematical accuracy of the acos function in TensorFlow's MLIR framework.

Files changed

tensorflow/compiler/mlir/tf2xla/tests/legalize-tf.mlir
third_party/stablehlo/temporary.patch

2024-07-22T18:26:26 See commit

The recent commit introduces the UnfoldSplatConstantPass to the MLIR (Multi-Level Intermediate Representation) framework, specifically before the legalization process of converting High-Level Operations (HLO) to TensorFlow Lite (TFLite). This new pass is crucial as it prevents the folding of splat constants with broadcasts, which can lead to an increase in model size and inefficiencies.

The changes made include modifications to the file tf_tfl_passes.cc, where the UnfoldSplatConstantPass is added to the pass manager just before the legalization step. This adjustment is necessary because the default behavior of the pattern rewriter driver is to apply folding, which could negatively impact the model size. The commit also includes a comment indicating a future intention to rewrite this pass as a pattern within the PrepareHloPass, highlighting ongoing efforts to optimize the conversion process.

Files changed

tensorflow/compiler/mlir/lite/tf_tfl_passes.cc

2024-07-22T19:42:13 See commit

This commit addresses a crash issue in the AllReduceBlueConnect functionality when multiple partitions are utilized. The crash occurred when the all-reduce operation was executed with modes other than kCrossReplica; now, the pass is restricted to run only with specific values for the CollectiveOpGroupMode: kCrossReplica and kFlattenedID. The author notes uncertainty about the usage of the other modes (kCrossPartition and kCrossReplicaAndPartition) in JAX programs and has opted not to support them at this time.

In addition to fixing the crash, the commit includes modifications to the codebase, enhancing the handling of device IDs based on replica IDs and the collective operation group mode. The changes involve adding new functions to manage the conversion of replica IDs to device IDs and updating the logic to ensure that the AllReduceBlueConnect pass only processes supported configurations. The commit also introduces tests to validate the functionality, particularly ensuring that the pass does not alter reduce-scatter operations and behaves correctly with multiple partitions. Overall, these enhancements improve the robustness and reliability of the AllReduceBlueConnect implementation.

Files changed

third_party/xla/xla/service/gpu/BUILD
third_party/xla/xla/service/gpu/all_reduce_blueconnect.cc
third_party/xla/xla/service/gpu/all_reduce_blueconnect_test.cc

2024-07-22T20:46:31 See commit

This commit introduces support for runtime pointer sizes in the sorting functionality of the XLA CPU backend. It modifies the SortThunk implementation to allow for sorting multiple input buffers of varying element sizes by utilizing template metaprogramming techniques. The changes include the addition of new structures, such as Value and Ref, which are designed to handle arrays of pointers and their corresponding sizes, enabling the sorting algorithm to manage elements of different sizes dynamically.

The commit also revises the SortInplace function to accommodate sorting operations for a variable number of input buffers, enhancing the flexibility and efficiency of the sorting process. Instead of relying on statically defined types, the new implementation leverages byte arrays and runtime size information, allowing for a more generalized sorting mechanism that can handle a broader range of data types. This change not only reduces code duplication but also improves the performance of sorting operations across different input configurations.

Files changed

third_party/xla/xla/service/cpu/runtime/sort_thunk.cc

2024-07-23T00:38:03 See commit

This commit focuses on the removal of the unused cuda_stream.h header file and its associated functions from the TensorFlow codebase. The changes include the deletion of the cuda_stream.h file itself, along with updates to various build configuration files to eliminate references to this now-removed component. The commit also modifies several source files, ensuring that any dependencies or functions related to cuda_stream are appropriately cleaned up.

By removing these unused elements, the commit aims to streamline the codebase, potentially improving maintainability and reducing clutter. The changes reflect a broader effort to enhance the efficiency of the TensorFlow framework, particularly in relation to CUDA integration, by ensuring that only relevant and necessary components are retained.

Files changed

tensorflow/core/kernels/BUILD
tensorflow/core/kernels/bias_op.cc
third_party/xla/xla/service/gpu/BUILD
third_party/xla/xla/stream_executor/cuda/BUILD
third_party/xla/xla/stream_executor/cuda/cuda_stream.h
third_party/xla/xla/stream_executor/gpu/gpu_stream.h
third_party/xla/xla/xla.bzl

2024-07-23T13:46:59 See commit

This commit integrates updates from LLVM at the specified commit hash (acc159aea1e6) into the TensorFlow MLIR (Multi-Level Intermediate Representation) framework, specifically focusing on quantization and type conversion functionalities. Key modifications include changes to the TFQuantTypePattern and BFloat16TypePattern classes, where the handling of regions and type conversions has been improved by using std::unique_ptr for new regions. This shift enhances memory management and clarifies the ownership of region objects during the conversion process. Additionally, the commit updates various test files to accommodate new flags and ensure compatibility with the LLVM integration.

The commit also includes modifications to several third-party components, such as LLVM and Triton, ensuring that the repository aligns with the latest changes from the upstream LLVM project. Notably, the changes involve updates to type conversion logic in multiple files, reinforcing the framework's ability to handle tensor operations and conversions more effectively. Overall, this integration aims to enhance the performance and reliability of MLIR's quantization and tensor manipulation capabilities while maintaining adherence to the latest LLVM standards.

Files changed

tensorflow/compiler/mlir/quantization/stablehlo/passes/bridge/convert_tf_quant_types.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/convert_func_to_bfloat16.cc
tensorflow/compiler/mlir/tools/kernel_gen/tests/tf_framework_legalize_to_llvm.mlir
tensorflow/compiler/mlir/tools/kernel_gen/tests/tf_kernel_gpu_launch_to_llvm.mlir
third_party/llvm/generated.patch
third_party/llvm/workspace.bzl
third_party/shardy/workspace.bzl
third_party/stablehlo/temporary.patch
third_party/triton/llvm_integration/series.bzl
third_party/xla/xla/mlir_hlo/mhlo/utils/type_conversion.cc

2024-07-23T19:56:09 See commit

This commit introduces a new feature to TensorFlow that allows users to dump graphs in either text or binary format, based on the environment variable TF_DUMP_GRAPH_FMT. The default format is set to text (".pbtxt"), but users can specify "BIN" to output in binary format (".pb"). The code modifications include the addition of functions to read the environment variable and determine the appropriate file suffix based on the specified format. The commit also updates existing functions that handle the dumping of graphs to utilize this new functionality, ensuring that the correct format is applied when writing graph definitions to files.

Additionally, the commit includes updates to the test suite to validate the new dumping capabilities. Tests were added to check for successful dumping in both text and binary formats and to handle cases where an unknown format is specified. Overall, this feature enhances the flexibility of graph dumping in TensorFlow, allowing for better integration and usability based on user needs.

Files changed

tensorflow/core/util/dump_graph.cc
tensorflow/core/util/dump_graph.h
tensorflow/core/util/dump_graph_test.cc

2024-07-23T22:00:57 See commit

This commit involves the addition of a command-line flags library to the TensorFlow MLIR Lite tools directory. Specifically, it introduces two new files: command_line_flags.cc and command_line_flags.h, which implement a simplified command-line argument parsing module. This new implementation is designed to be dependency-free and is intended for use in benchmarks, avoiding the complexities of the existing TensorFlow core utilities. The library allows users to define flags with various types (e.g., integers, booleans, strings) and provides functionality for parsing command-line arguments based on these definitions.

The commit also updates the BUILD file to include the new command_line_flags library, establishing its visibility and dependencies. The added code includes detailed functionality for flag parsing, including handling positional and required flags, as well as error logging for parsing failures. Overall, this change enhances the toolset available for working with TensorFlow Lite by providing a more streamlined approach to command-line argument handling.

Files changed

tensorflow/compiler/mlir/lite/tools/BUILD
tensorflow/compiler/mlir/lite/tools/command_line_flags.cc
tensorflow/compiler/mlir/lite/tools/command_line_flags.h

2024-07-24T04:05:27 See commit

This commit involves extensive modifications to the TensorFlow DTensor MLIR (Multi-Level Intermediate Representation) framework, specifically focusing on the expansion files for various operations. The changes affect multiple source files, including both .cc and .h files for operations such as random, range, reduce, replicated, resource, and softmax operations, among others. Each of these expansions plays a critical role in the distributed processing capabilities of TensorFlow, indicating a comprehensive update aimed at enhancing the framework's functionality.

The modifications suggest improvements or updates to the logic and implementation of the specified operations within the DTensor framework. By refining these expansion files, the commit likely aims to optimize performance, enhance compatibility with distributed systems, or introduce new features. The thorough nature of the changes indicates a significant effort to ensure that a wide array of operations can be efficiently processed in a distributed context, which is essential for scaling machine learning workloads.

Files changed

tensorflow/dtensor/mlir/BUILD
tensorflow/dtensor/mlir/expansions/random_op_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/random_op_spmd_expander.h
tensorflow/dtensor/mlir/expansions/range_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/range_spmd_expander.h
tensorflow/dtensor/mlir/expansions/reduce_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/reduce_spmd_expander.h
tensorflow/dtensor/mlir/expansions/replicated_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/replicated_spmd_expander.h
tensorflow/dtensor/mlir/expansions/resource_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/resource_spmd_expander.h
tensorflow/dtensor/mlir/expansions/save_restore_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/save_restore_spmd_expander.h
tensorflow/dtensor/mlir/expansions/scatter_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/scatter_spmd_expander.h
tensorflow/dtensor/mlir/expansions/segmentation_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/segmentation_spmd_expander.h
tensorflow/dtensor/mlir/expansions/slice_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/slice_spmd_expander.h
tensorflow/dtensor/mlir/expansions/softmax_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/softmax_spmd_expander.h
tensorflow/dtensor/mlir/expansions/sparse_to_dense_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/sparse_to_dense_spmd_expander.h
tensorflow/dtensor/mlir/expansions/split_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/split_spmd_expander.h
tensorflow/dtensor/mlir/expansions/squeeze_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/squeeze_spmd_expander.h
tensorflow/dtensor/mlir/expansions/strided_slice_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/strided_slice_spmd_expander.h
tensorflow/dtensor/mlir/expansions/tensorlist_getitem_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/tensorlist_getitem_spmd_expander.h
tensorflow/dtensor/mlir/expansions/tensorlist_reserve_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/tensorlist_reserve_spmd_expander.h
tensorflow/dtensor/mlir/expansions/tensorlist_setitem_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/tensorlist_setitem_spmd_expander.h
tensorflow/dtensor/mlir/expansions/top_k_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/top_k_spmd_expander.h
tensorflow/dtensor/mlir/expansions/trivial_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/trivial_spmd_expander.h
tensorflow/dtensor/mlir/expansions/unsupported_op_spmd_expander.cc
tensorflow/dtensor/mlir/expansions/unsupported_op_spmd_expander.h
tensorflow/dtensor/mlir/expansions/where_spmd_expander.cc

2024-07-24T18:40:55 See commit

This commit introduces a new boolean field, use_shardy_partitioner, to the ExecutableOptions class within the XLA (Accelerated Linear Algebra) library. This new field is designed to indicate whether the Shardy partitioner should be utilized, replacing the existing ShardingPropagation and SpmdPartitioner methods. The changes include updates to various files, such as executable_build_options.cc, executable_build_options.h, and hlo_module_config.cc, ensuring that the new field is integrated into the serialization and deserialization processes of executable build options and module configurations.

Additionally, the commit adds a test case to validate the functionality of the use_shardy_partitioner field. The test checks that when the partitioner is set to true, the compiled module correctly reflects this setting, ensuring the new functionality works as intended. Overall, this update enhances the flexibility of the XLA library by allowing developers to opt for the Shardy partitioning strategy, which is part of ongoing improvements in the XLA codebase.

Files changed

third_party/xla/xla/client/executable_build_options.cc
third_party/xla/xla/client/executable_build_options.h
third_party/xla/xla/hlo/ir/hlo_module.cc
third_party/xla/xla/pjrt/compile_options.proto
third_party/xla/xla/service/hlo_module_config.cc
third_party/xla/xla/service/hlo_module_config.h
third_party/xla/xla/service/hlo_module_util.cc
third_party/xla/xla/tests/local_client_execute_test.cc
third_party/xla/xla/xla.proto