tensorflow changelog


Here's a rundown of the latest changes and improvements:

New Features

  • [xla:ffi] API to Update CallFrame with Runtime Values: ๐Ÿš€ Added an API to update CallFrame with new runtime values (buffer pointers), enhancing the flexibility of XLA's foreign function interface.
  • [XLA:GPU] Deterministic Flash Attention Backward Implementation: ๐Ÿงฉ Introduced deterministic flash attention backward implementation in XLA:GPU, providing more control and consistency.
  • [XLA:CPU][oneDNN] F16 Convolutions on Supported CPUs: ๐ŸŽ‰ Enabled F16 convolutions on supported Intel CPUs, boosting performance and efficiency.
  • [XLA:CPU][oneDNN] Matmul-Bias-Add Fusion: ๐Ÿ”ฅ Enabled fusion of matmul followed by bias-add and binary-add operations in XLA:CPU, optimizing performance.
  • Testing Utility for v2 API Test Data Path: ๐Ÿงช Added a utility for managing test data paths for the v2 API in TensorFlow, laying the groundwork for future testing needs.
  • Support for uint8_t Dot Operation Tests: ๐Ÿค– Added support for uint8_t dot operation tests and corresponding HLO evaluator support, expanding the library's capabilities.

Improvements

  • HLO Deduplication and Execution Threads Test: ๐Ÿ› ๏ธ Added a comprehensive test for HLO deduplication and execution threads in XLA, ensuring robust functionality.
  • Recursive Work Splitting for Thunk Executor Tasks: ๐ŸŽ๏ธ Introduced recursive work splitting to launch thunk executor tasks, improving performance and avoiding bottlenecks.

Bugfixes

  • [XLA:FFI] Catch Exceptions in User FFI Calls: ๐Ÿ› Added a defensive try/catch mechanism to handle exceptions in user FFI calls, enhancing reliability.
  • Fix for Execution Stream Assignment Test: ๐Ÿ”ง Fixed the constructor initialization error in the execution_stream_assignment_test, ensuring the test runs successfully.
  • Removal of mlir2exec Test: ๐Ÿงน Removed the mlir-tflite-runner binary and related test utilities, indicating a cleanup or restructuring of the MLIR Lite module.

Chores

  • Split Definitions from reduced_precision_support.h: ๐Ÿ“‚ Split definitions into a new file, reduced_precision_metadata.h, for better organization and maintainability.

These updates bring a mix of new features, improvements, bug fixes, and organizational changes, aimed at enhancing the performance, reliability, and maintainability of the XLA and TensorFlow projects. ๐Ÿš€

Included Commits

2024-07-01T17:35:52 See commit

This commit in the XLA:FFI code adds a defensive try/catch mechanism when calling user code through the FFI interface. This change aims to catch C++ exceptions and convert them into an absl status, allowing for a more graceful handling of failures. The modifications include adding try/catch blocks in the Call functions in ffi_api.cc to handle exceptions thrown by user code, as well as updating the test cases in ffi_test.cc to verify the proper handling of exceptions.

Overall, this update enhances the robustness of the FFI interface by ensuring that exceptions thrown by user code are caught and converted into meaningful error messages, improving the overall reliability and error handling capabilities of the XLA:FFI system.

Files changed

  • third_party/xla/xla/ffi/BUILD
  • third_party/xla/xla/ffi/ffi_api.cc
  • third_party/xla/xla/ffi/ffi_test.cc
2024-07-01T18:11:26 See commit

This commit removes the mlir-tflite-runner binary from the TensorFlow compiler's MLIR Lite module. The mlir-tflite-runner binary was responsible for running TFLite computations from MLIR input using the TFLite interpreter. The commit also removes the mlir2exec test utilities and a test file for the TFLite while operation, which was used for verifying translation and export functionality with runtime. The removal of these components indicates a cleanup or restructuring of the MLIR Lite module within the TensorFlow compiler.

In summary, this commit removes the mlir-tflite-runner binary, test utilities, and a test file related to TFLite operations from the MLIR Lite module in the TensorFlow compiler. This cleanup may signify a shift in focus or a reorganization of the MLIR Lite module's functionality and testing processes.

Files changed

  • tensorflow/compiler/mlir/lite/BUILD
  • tensorflow/compiler/mlir/lite/mlir_tflite_runner.cc
  • tensorflow/compiler/mlir/lite/tests/mlir2exec/BUILD
  • tensorflow/compiler/mlir/lite/tests/mlir2exec/tfl_while_op.mlir
2024-07-01T20:51:14 See commit

This commit adds support for uint8_t dot operation tests and corresponding support in the HLO evaluator. Specifically, it includes modifications to the HLO evaluator implementation to handle uint8_t data types in matrix multiplication operations. This involves adding a new method MatmulArray2D that works with arrays of uint8_t values and updating relevant files in the third-party XLA library to support this new functionality.

The changes include modifications to the HLO evaluator source files, addition of new methods for uint8_t matrix multiplication, updates to CPU runtime files to include support for uint8_t operations, and modifications to test files to include tests for uint8_t dot operations. These changes enable the HLO evaluator to handle uint8_t data types in matrix operations, expanding the capabilities of the library.

Files changed

  • third_party/xla/xla/hlo/evaluator/hlo_evaluator.cc
  • third_party/xla/xla/hlo/evaluator/hlo_evaluator.h
  • third_party/xla/xla/service/cpu/BUILD
  • third_party/xla/xla/service/cpu/cpu_runtime.cc
  • third_party/xla/xla/service/cpu/cpu_runtime.h
  • third_party/xla/xla/service/cpu/runtime_single_threaded_matmul.h
  • third_party/xla/xla/service/cpu/runtime_single_threaded_matmul_u8.cc
  • third_party/xla/xla/service/cpu/simple_orc_jit.cc
  • third_party/xla/xla/tests/dot_operation_test.cc
2024-07-02T07:57:07 See commit

This commit fixes the test //xla/service/gpu:execution_stream_assignment_test in the open-source project XLA. The test was failing due to a constructor initialization error in the execution_stream_assignment_test.cc file. The fix includes modifying the constructor initialization for AsyncExecutionStreamIds in the test, resolving the error and ensuring the test runs successfully.

The commit was imported from GitHub PR #14356 and was merged to close the issue. The changes made in the commit include modifications to the execution_stream_assignment_test.cc file, with additions and deletions made to fix the constructor initialization error. The patch details the specific changes made to address the issue and ensure the test passes without errors.

Files changed

  • third_party/xla/xla/service/gpu/execution_stream_assignment_test.cc
2024-07-02T21:18:03 See commit

This commit adds a testing utility for getting the test data path for the v2 API in TensorFlow. It sets up a base directory for future utilities and testing needs, and the existing test data will be migrated to this directory. The commit includes changes to the BUILD file, the utils.cc and utils.h files, as well as the utils_test.cc file. The utils.cc file defines a function that returns the path to the test data directory, while the utils_test.cc file contains a test case to verify the correctness of the TestDataPath function.

Overall, this commit enhances the testing infrastructure for the v2 API in TensorFlow by providing a utility for managing test data paths and ensuring the correctness of the test data directory setup. It also lays the foundation for future testing needs and utilities within the v2 API testing framework.

Files changed

  • tensorflow/compiler/mlir/tf2xla/api/v2/testing/BUILD
  • tensorflow/compiler/mlir/tf2xla/api/v2/testing/utils.cc
  • tensorflow/compiler/mlir/tf2xla/api/v2/testing/utils.h
  • tensorflow/compiler/mlir/tf2xla/api/v2/testing/utils_test.cc
2024-07-03T22:14:40 See commit

This commit involves splitting definitions from the file reduced_precision_support.h into a new file named reduced_precision_metadata.h. The changes include modifications to various files within the TensorFlow compiler, specifically in the mlir/lite and lite/tools/optimize directories. The reduced_precision_metadata.h file now contains definitions related to reduced precision support, including constants, enums, and functions for handling different types of reduced precision support, such as float16 inference, bfloat16 inference, and accumulation support.

Additionally, the commit also includes updates to various BUILD files to reflect the changes in dependencies and file structure. The reduced_precision_support.h file has been modified to remove the definitions that have been moved to reduced_precision_metadata.h. Overall, this commit reorganizes and separates the code related to reduced precision support into a new header file for better organization and maintainability.

Files changed

  • tensorflow/compiler/mlir/lite/BUILD
  • tensorflow/compiler/mlir/lite/tf_to_tfl_flatbuffer.cc
  • tensorflow/compiler/mlir/lite/tools/optimize/BUILD
  • tensorflow/compiler/mlir/lite/tools/optimize/reduced_precision_metadata.h
  • tensorflow/lite/tools/optimize/BUILD
  • tensorflow/lite/tools/optimize/reduced_precision_support.h
2024-07-04T09:19:41 See commit

This commit enables F16 convolutions on supported Intel CPUs by adding the necessary code to the XLA project. Specifically, it includes changes to the BUILD file, the change_op_data_type.cc file, and the onednn_convolution_test.cc file. The addition of the onednn_convolution_rewriter allows for F16 convolutions to be supported on the specified CPU platforms. Additionally, a test case for Simple2DTestF16 is added to ensure that the functionality works correctly.

Overall, this PR aims to enhance the capabilities of XLA by enabling F16 convolutions on supported CPU platforms, which can improve performance and efficiency for certain operations. The changes made in this commit are crucial for leveraging the benefits of F16 convolutions on Intel CPUs, ultimately contributing to the optimization of XLA's functionality.

Files changed

  • third_party/xla/xla/service/BUILD
  • third_party/xla/xla/service/change_op_data_type.cc
  • third_party/xla/xla/tests/onednn_convolution_test.cc
2024-07-04T10:22:21 See commit

This commit enables the fusion of matmul followed by bias-add and binary-add operations in XLA:CPU. It includes changes to the onednn_matmul_rewriter.cc file to adjust the shapes of the binary operands for fusion, as well as modifications to the tests to ensure the fusion is working correctly. The commit removes a check that was blocking this fusion, allowing for the fusion of these operations in the XLA:CPU backend.

Specifically, the commit includes adjustments to the AdjustBinaryOperandShape function to enable fusion in oneDNN, as well as changes to the OneDnnMatMulRewriteVisitor class to validate and adjust the shapes of the operands for fusion. Additionally, new tests were added to test the fusion of matmul with bias-add and binary-add operations, ensuring that the fusion is working as expected.

Files changed

  • third_party/xla/xla/service/cpu/onednn_matmul_rewriter.cc
  • third_party/xla/xla/tests/onednn_matmul_test.cc
2024-07-04T22:12:47 See commit

This commit in the xla:cpu repository introduces the use of recursive work splitting to launch thunk executor tasks. It also includes changes such as using explicit capture lists in lambdas to avoid accidental captures and fixing a discovered asan error that was not reported before adding explicit captures. The benchmarks show improvements in process time and time per operation for different scenarios, with some cases showing significant reductions in time.

The commit also includes changes in the ThunkExecutor class to implement recursive work splitting for a more uniform work distribution across task runner threads and to avoid bottlenecks when processing a long tail of work with a single thread. Additionally, the TransitiveReduction function was modified to erase edges between nodes more efficiently. Benchmark tests for SequentialThunkExecutor, SyncThunkExecutor, and AsyncThunkExecutor were updated to measure process CPU time for different argument values.

Files changed

  • third_party/xla/xla/service/cpu/runtime/thunk_executor.cc
  • third_party/xla/xla/service/cpu/runtime/thunk_executor_test.cc
2024-06-28T02:25:05 See commit

This commit adds an API to update CallFrame with new run time values, specifically buffer pointers. The changes include modifications to several files in the third_party/xla/xla/ffi directory, such as api.h, BUILD, call_frame.cc, call_frame.h, call_frame_test.cc, and ffi_test.cc. This update allows for the CallFrame to be updated with new values during runtime, enhancing the functionality and flexibility of the XLA library's foreign function interface.

Overall, this commit introduces a new API that enables the updating of CallFrame with runtime values, specifically buffer pointers. The modifications made to various files in the ffi directory enhance the functionality and flexibility of the XLA library's foreign function interface, allowing for more dynamic and efficient usage of the library during runtime.

Files changed

  • third_party/xla/xla/ffi/BUILD
  • third_party/xla/xla/ffi/api/api.h
  • third_party/xla/xla/ffi/call_frame.cc
  • third_party/xla/xla/ffi/call_frame.h
  • third_party/xla/xla/ffi/call_frame_test.cc
  • third_party/xla/xla/ffi/ffi_test.cc
2024-06-28T05:52:10 See commit

This commit adds support for enforcing deterministic flash attention backward implementation in XLA:GPU. The cuDNN frontend 1.5 introduced an option to enforce deterministic behavior, which is now guarded by xla_gpu_deterministic_ops and xla_gpu_exclude_nondeterministic_ops. This change includes modifications to the backend configs, the workspace rewriter, the GPU fused MHA runner, the IR emitter, and tests to ensure the deterministic behavior is correctly implemented.

The addition of the force_deterministic attribute in the backend config allows for controlling the deterministic behavior of flash attention backward implementation. The changes made in various files ensure that the flash attention backward operation in XLA:GPU can now be enforced to be deterministic, providing more control and consistency in the implementation. Tests have been added to verify the deterministic behavior under different conditions, enhancing the reliability of flash attention backward operations in XLA:GPU.

Files changed

  • third_party/xla/xla/service/gpu/backend_configs.proto
  • third_party/xla/xla/service/gpu/cudnn_workspace_rewriter.cc
  • third_party/xla/xla/service/gpu/gpu_fused_mha_runner.cc
  • third_party/xla/xla/service/gpu/gpu_fused_mha_runner.h
  • third_party/xla/xla/service/gpu/ir_emitter_unnested.cc
  • third_party/xla/xla/service/gpu/tests/gpu_fused_mha_test.cc
  • third_party/xla/xla/stream_executor/cuda/cuda_dnn.cc
  • third_party/xla/xla/stream_executor/cuda/cuda_dnn.h
  • third_party/xla/xla/stream_executor/dnn.cc
  • third_party/xla/xla/stream_executor/dnn.h
  • third_party/xla/xla/stream_executor/lazy_op_runner.h
2024-06-28T22:40:04 See commit

This commit adds a test for HLO (High Level Optimization) deduplication and execution threads in the XLA (Accelerated Linear Algebra) library. The test is added to the hlo_computation_deduplicator_test.cc file. It includes various scenarios such as removing regions with the same subcomputation, not removing regions with different subcomputation, considering commutativity, and not removing regions with different execution threads. The test cases cover different aspects of computation deduplication within a HloModule.

Additionally, modifications are made to the hlo_computation_deduplicator.cc and hlo_computation_deduplicator.h files to support the test and ensure proper functionality of the HLO computation deduplication process. Dependencies are added in the BUILD file, and necessary includes and functions are updated in the source files to handle computation deduplication based on various criteria. Overall, this commit enhances the testing and functionality related to HLO deduplication and execution threads in the XLA library.

Files changed

  • third_party/xla/xla/service/BUILD
  • third_party/xla/xla/service/hlo_computation_deduplicator.cc
  • third_party/xla/xla/service/hlo_computation_deduplicator.h
  • third_party/xla/xla/service/hlo_computation_deduplicator_test.cc