tensorflow changelog

1 year ago

Welcome to our latest update! We've been busy adding some awesome new features, squashing pesky bugs, and making improvements to keep everything running smoothly. Here's the lowdown on what's new and improved:

### New Features
- **Asynchronous Launch for HostKernel** 🚀: We've introduced async launch to HostKernel and employed Eigen device to parallelize kernel execution. This means better resource utilization and faster computations on the CPU platform.
- **StableHLO Integration**: Integrated StableHLO at openxla/stablehlo@dd48ec58, adding new operations for uniform quantization and all-to-all operations. This boosts the functionality and efficiency of our operations.
- **Int4 Support in Dequantize Op**: Added support for int4 in the dequantize operation, including per-channel dequantization. This enhances the flexibility and functionality of TensorFlow Lite.
- **'decompose_optionals' Pass**: Introduced a new pass to decompose optional operations into simpler identity operations, improving code readability and maintainability.
- **Aliasing Semantics for Nested Fusions**: Added aliasing semantics for nested fusions, enhancing the accuracy and functionality of fusion analysis in the XLA service.

### Improvements
- **Recursive Work Splitting for Host Tasks**: Implemented recursive work splitting to submit host tasks, significantly improving wall time for task submission into a thread pool.
- **JAX Builds Centralization**: Moved JAX builds to build.py, streamlining the build process and improving test environments for JAX_CPU and JAX_GPU.
- **Stream Dependency Management**: Eliminated StreamExecutor::CreateStreamDependency by consolidating its code into Stream and its derived classes, optimizing stream dependency management.

### Bugfixes
- **Revert Changelist 641306427**: Reverted a previous change, updating tensor types in the CastOperationParser test to ensure correct operation.
- **Float Conversion Fixes**: Addressed issues with float conversions for fp8 and u64, fixing missing lowerings and incorrect upper bounds to resolve unary_ops_test_gpu.
- **Revert c2e7e9f6c3f4d4937d8145f988ea74818e000ecc**: Reverted changes that removed references to Google's Abseil library, restoring functionality related to remote tensor handles.

### Chores
- **LLVM Integration**: Updated LLVM usage to match the latest commit [7476c20c481c](https://github.com/llvm/llvm-project/commit/7476c20c481c), ensuring we are using the most up-to-date version for development.

Stay tuned for more updates and happy coding! 🎉

Included Commits

2024-06-07T04:13:07 See commit

This commit integrates LLVM at the specified commit 7476c20c481c. It updates the LLVM usage to match this commit. The changes include modifications to the generated patch in the third_party/llvm directory, with 22 additions, 13 deletions, and a total of 35 changes. The patch includes updates to various files within the LLVM project, such as libc/test/src/math/smoke/BUILD.bazel and lldb/test/API/functionalities/stats_api/TestStatisticsAPI.py.

Additionally, the commit modifies the third_party/llvm/workspace.bzl file, with 2 additions, 2 deletions, and a total of 4 changes. It updates the LLVM commit and SHA256 values to reflect the new commit [7476c20c481c]. This integration of LLVM at the specified commit ensures that the project is using the most up-to-date version of LLVM for its development and functionality.

Files changed

third_party/llvm/generated.patch
third_party/llvm/workspace.bzl

2024-06-07T12:20:58 See commit

This commit addresses issues with float conversions for fp8 and u64 in the code. Specifically, the fp8 was missing lowerings for cmpf and fptoi operations, while the u64 was using incorrect upper bounds. The fix includes adding the missing lowerings and correcting the upper bounds for u64, which resolves the problem with unary_ops_test_gpu.

The changes made in the commit include modifications to the elemental_hlo_to_mlir.cc and elemental_hlo_to_mlir_test.cc files, as well as expand_float_ops.cc, passes.td, and expand_float_ops.mlir. These changes involve adding new functions, rewriting patterns, and updating tests to ensure the correct behavior of float conversions for fp8 and u64. The commit also includes additions and deletions in the code files to implement the necessary fixes.

Files changed

third_party/xla/xla/service/gpu/fusions/mlir/elemental_hlo_to_mlir.cc
third_party/xla/xla/service/gpu/fusions/mlir/elemental_hlo_to_mlir_test.cc
third_party/xla/xla/service/gpu/fusions/mlir/expand_float_ops.cc
third_party/xla/xla/service/gpu/fusions/mlir/passes.td
third_party/xla/xla/service/gpu/fusions/mlir/tests/expand_float_ops.mlir

2024-06-10T02:36:09 See commit

This commit introduces asynchronous launch to HostKernel and utilizes Eigen device to parallelize kernel execution. The changes include modifications to the cpu_client.cc file to incorporate task handling, additions to the task.h file to define task conversion functions, modifications to the thunk_executor.h file to remove task conversion functions, and changes to the kernel_thunk.cc file to handle asynchronous execution. Additionally, modifications were made to the host_executor.cc and host_kernel.cc files to support asynchronous task execution and completion reporting. The commit also includes changes to the BUILD files to add dependencies and include necessary headers for the new functionality.

Overall, the commit enhances the CPU execution capabilities by introducing asynchronous task handling and parallelization through Eigen devices, improving the efficiency and performance of kernel execution in the XLA framework on the CPU platform. The changes allow for better utilization of resources and more efficient processing of tasks, contributing to overall optimization and speedup of computations on the CPU.

Files changed

third_party/xla/xla/pjrt/cpu/BUILD
third_party/xla/xla/pjrt/cpu/cpu_client.cc
third_party/xla/xla/service/cpu/runtime/BUILD
third_party/xla/xla/service/cpu/runtime/kernel_thunk.cc
third_party/xla/xla/service/cpu/runtime/task.h
third_party/xla/xla/service/cpu/runtime/thunk.h
third_party/xla/xla/service/cpu/runtime/thunk_executor.h
third_party/xla/xla/stream_executor/host/BUILD
third_party/xla/xla/stream_executor/host/host_executor.cc
third_party/xla/xla/stream_executor/host/host_kernel.cc
third_party/xla/xla/stream_executor/host/host_kernel.h

2024-06-10T04:42:24 See commit

This commit introduces the use of recursive work splitting to submit host tasks, which significantly improves the wall time for submitting tasks into a thread pool. The performance improvements are evident in various benchmarks, with notable reductions in process time for different numbers of tasks launched. The changes include modifications to the host_kernel.cc file, with additions and deletions made to improve the efficiency of task submission and execution. The commit also includes changes to the BUILD file to support these modifications.

Specifically, the commit implements a new approach to handle asynchronous kernel execution on the host by keeping track of the state of in-flight executions and orchestrating the tasks more efficiently. This involves converting buffers to kernel arguments, calling tasks synchronously and asynchronously, and utilizing a control structure to manage the execution flow. Overall, these changes result in a significant improvement in performance metrics related to task submission and execution in a thread pool environment.

Files changed

third_party/xla/xla/stream_executor/host/BUILD
third_party/xla/xla/stream_executor/host/host_kernel.cc

2024-06-10T20:45:37 See commit

This commit introduces aliasing semantics for nested fusions in the XLA service. It adds the capability to search for operands/outputs that alias each other at the outermost level of the fusion by looking through nested fusions with output_to_operand_aliasing. Additionally, the verifier has been updated to accommodate these changes, and nested fusion cost handling for dynamic update slice has been included.

The modifications include changes to the HloCostAnalysis class, the HloCostAnalysis test, the HloDataflowAnalysis class, the HloDataflowAnalysis test, and the HloVerifier class. Specifically, the code now handles nested fusions with aliasing semantics, allowing for more accurate analysis of fusion operations. The verifier has been adjusted to consider nested fusions and their potential aliasing, relaxing constraints for certain conditions like DynamicUpdateSlice within fusions. Overall, these updates enhance the functionality and accuracy of fusion analysis in the XLA service.

Files changed

third_party/xla/xla/service/hlo_cost_analysis.cc
third_party/xla/xla/service/hlo_cost_analysis.h
third_party/xla/xla/service/hlo_cost_analysis_test.cc
third_party/xla/xla/service/hlo_dataflow_analysis.cc
third_party/xla/xla/service/hlo_dataflow_analysis_test.cc
third_party/xla/xla/service/hlo_verifier.cc

2024-06-10T23:13:22 See commit

This commit adds support for int4 in the dequantize operation, including per-channel dequantization. The changes include modifications to the tf.lite Dequantize op to support TensorType_INT4 and per-channel dequantization. It also includes updates to various files such as register.cc, dequantize.cc, dequantize.h, dequantize_test.cc, BUILD, per_channel_dequantize_test.cc, portable_tensor_utils.cc, and runtime_version.cc to implement and test the int4 support in the dequantize operation.

Overall, this commit enhances the dequantize operation in TensorFlow Lite by adding int4 support and enabling per-channel dequantization, improving the flexibility and functionality of the operation for different data types and scenarios.

Files changed

RELEASE.md
tensorflow/lite/core/kernels/register.cc
tensorflow/lite/kernels/dequantize.cc
tensorflow/lite/kernels/dequantize.h
tensorflow/lite/kernels/dequantize_test.cc
tensorflow/lite/kernels/internal/BUILD
tensorflow/lite/kernels/internal/per_channel_dequantize_test.cc
tensorflow/lite/kernels/internal/portable_tensor_utils.cc
tensorflow/lite/tools/versioning/runtime_version.cc

2024-06-11T23:09:09 See commit

The commit moves JAX builds to the build.py file, making changes to the Build class by adding new build configurations for JAX_CPU and JAX_GPU. It also includes specific test environments and options for these builds, such as setting test environments variables and specifying test output preferences. Additionally, the commit updates the main function to handle the new JAX_CPU and JAX_GPU builds, and adds corresponding entries to the _KOKORO_JOB_NAME_TO_BUILD_MAP.

In the .kokoro/jax/build.sh file, the commit modifies the script to call the build.py file for building and testing JAX on RBE CPU and GPU environments. It sets up the necessary environment variables and configurations for these builds and runs the tests accordingly. The script also generates a templated results file to make the output accessible to everyone. Overall, the commit centralizes the JAX builds in the build.py file and updates the build script to reflect these changes.

Files changed

third_party/xla/.kokoro/jax/build.sh
third_party/xla/build_tools/build.py

2024-06-12T00:53:27 See commit

This commit reverts changelist 641306427 and makes changes to the model_builder_test.cc file in the tensorflow/lite/delegates/gpu/common directory. Specifically, it modifies the type of a tensor from kTfLiteInt64 to kTfLiteInt8, and also changes the type of another tensor to kTfLiteFloat32. The test for the CastOperationParserTest is updated to reflect these changes, with one addition and one deletion in the code.

Overall, this commit undoes a previous change and updates the tensor types in the test for the CastOperationParser, making sure that the test reflects the correct tensor types for the operations being tested.

Files changed

tensorflow/lite/delegates/gpu/common/model_builder_test.cc

2024-06-12T03:53:34 See commit

This commit reverts a previous change that removed references to Google's Abseil library in the TensorFlow codebase. The reverted changes include modifications to the BUILD file, remote_mgr.cc, remote_mgr.h, and remote_mgr_test.cc. The modifications involve restoring references to the Abseil log and time modules, as well as updating functions like ValidateRemoteTensorHandle and GetRemoteTensorHandle in the remote_mgr class. Additionally, changes related to device management and serialization of tensor handles are also reverted in the remote_mgr_test class.

Overall, this commit undoes the removal of certain Abseil library references and restores functionality related to handling remote tensor handles in the distributed runtime eager module of TensorFlow. The changes aim to ensure proper validation and serialization of remote tensor handles, as well as address potential deadlocks in the serialization process.

Files changed

tensorflow/core/distributed_runtime/eager/BUILD
tensorflow/core/distributed_runtime/eager/remote_mgr.cc
tensorflow/core/distributed_runtime/eager/remote_mgr.h
tensorflow/core/distributed_runtime/eager/remote_mgr_test.cc

2024-06-12T20:02:01 See commit

The commit adds a new pass called 'decompose_optionals' to the TensorFlow MLIR compiler. This pass decomposes operations related to optional types, such as OptionalFromValue, OptionalGetValue, OptionalNone, and OptionalHasValue, by replacing them with identity operations. For example, if there is an operation like OptionalFromValue followed by OptionalGetValue, the pass will transform them into simple identity operations, simplifying the code structure. The pass includes specific rewrite patterns for handling these optional operations and propagating types across the program to ensure consistency.

In addition to adding the 'decompose_optionals' pass, the commit also includes changes to specific MLIR files related to testing and transformations, as well as updates to the TensorFlow MLIR transforms BUILD file to incorporate the new pass. The pass aims to streamline the handling of optional types in TensorFlow MLIR code and improve the overall readability and maintainability of the codebase.

Files changed

tensorflow/compiler/mlir/tensorflow/tests/decompose_optionals.mlir
tensorflow/compiler/mlir/tensorflow/transforms/BUILD
tensorflow/compiler/mlir/tensorflow/transforms/decompose_optionals.cc
tensorflow/compiler/mlir/tensorflow/transforms/passes.h
tensorflow/compiler/mlir/tensorflow/transforms/tf_passes.td

2024-06-12T21:57:17 See commit

The commit eliminates the StreamExecutor::CreateStreamDependency function by directly incorporating its code into the Stream and its derived classes. This change involves modifying multiple files, including stream_executor.cc, stream_executor_internal.h, executor.cc, executor.h, cuda_executor.cc, gpu_executor.h, gpu_stream.cc, host_executor.cc, host_stream.cc, host_stream.h, and others. The modifications involve removing the CreateStreamDependency function from the respective classes and replacing it with the necessary code within the WaitFor function in the Stream and its derived classes. Additionally, changes were made to handle dependencies between streams more efficiently and effectively.

By consolidating the code related to stream dependencies into the Stream and its derived classes, the commit streamlines the implementation and improves the organization of the codebase. This modification simplifies the process of handling dependencies between streams and enhances the overall efficiency of the stream execution. The changes made in this commit aim to optimize the stream dependency management within the TensorFlow framework.

Files changed

tensorflow/c/experimental/stream_executor/stream_executor.cc
tensorflow/c/experimental/stream_executor/stream_executor_internal.h
third_party/xla/xla/backends/interpreter/executor.cc
third_party/xla/xla/backends/interpreter/executor.h
third_party/xla/xla/stream_executor/cuda/cuda_executor.cc
third_party/xla/xla/stream_executor/gpu/BUILD
third_party/xla/xla/stream_executor/gpu/gpu_executor.h
third_party/xla/xla/stream_executor/gpu/gpu_stream.cc
third_party/xla/xla/stream_executor/gpu/gpu_stream.h
third_party/xla/xla/stream_executor/host/BUILD
third_party/xla/xla/stream_executor/host/host_executor.cc
third_party/xla/xla/stream_executor/host/host_executor.h
third_party/xla/xla/stream_executor/host/host_stream.cc
third_party/xla/xla/stream_executor/host/host_stream.h
third_party/xla/xla/stream_executor/mock_stream_executor.h
third_party/xla/xla/stream_executor/rocm/rocm_executor.cc
third_party/xla/xla/stream_executor/stream_common.cc
third_party/xla/xla/stream_executor/stream_common.h
third_party/xla/xla/stream_executor/stream_executor.h
third_party/xla/xla/stream_executor/tpu/tpu_executor.cc
third_party/xla/xla/stream_executor/tpu/tpu_executor.h
third_party/xla/xla/stream_executor/tpu/tpu_stream.h

2024-06-14T01:51:24 See commit

This commit integrates StableHLO at openxla/stablehlo@dd48ec58. The changes include modifications to third_party/stablehlo/workspace.bzl, with the commit and SHA256 values updated. Additionally, modifications were made to third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops.cc, where new operations like UniformDequantizeOp and UniformQuantizeOp were added, along with their respective functions for inference and verification. Changes were also made to third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops.td to include a verifier for UniformQuantizeOp and third_party/xla/xla/mlir_hlo/tests/Dialect/mhlo/mhlo_quantized.mlir to add new functions for uniform quantization.

Furthermore, modifications were made to third_party/xla/xla/mlir_hlo/tests/Dialect/mhlo/ops.mlir to include new functions like all_to_all_same_split_concat_dim, demonstrating the usage of mhlo.all_to_all operation with specific parameters. Overall, this commit introduces new operations, functions, and verifiers related to uniform quantization and all-to-all operations in StableHLO.

Files changed

third_party/stablehlo/workspace.bzl
third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops.cc
third_party/xla/xla/mlir_hlo/mhlo/IR/hlo_ops.td
third_party/xla/xla/mlir_hlo/tests/Dialect/mhlo/mhlo_quantized.mlir
third_party/xla/xla/mlir_hlo/tests/Dialect/mhlo/ops.mlir