tensorflow changelog

1 year ago

Hey there, code wranglers! We've got a bunch of updates to share with you. From new features to bug fixes, here's the latest scoop on what's been happening under the hood. 🚀

New feature

Containers with CUDA 12.3 and CUDNN 8.9: Added new containers with CUDA 12.3 and CUDNN 8.9. This update makes sure you can build manylinux 2014 compliant cross-compilers targeting compatible glibc and system libstdc++. 🚀
Weight-only quantization: Introduced weight-only quantization for convolution and dot_general operations. This adds support for the weight_only_ptq method, making your deep learning models leaner and meaner. 🏋️‍♂️
CalibrationStatisticsSaver op: Added a new op definition to replace the CalibrationSingleton, aggregating and saving statistics to files. This op is stateful and designed to run on the CPU, making it easy to lift to outer functions. 📊
Async dynamic slicing: Implemented async dynamic slicing for host memory offloading on GPU. Dynamic slicing instructions are wrapped in a fusion node, allowing for asynchronous execution. 🌀
StableHLO integration: Integrated StableHLO at openxla/stablehlo@714d9aca, updating various functions and constants. 🛠️

Improvement

Variable dtype and shape storage: Enhanced IfrtRestoreTensorRegistry to store variable dtype and shape, improving tensor restoration and lookup during execution. 🧠
Global shuffling for memory cache dataset: Added support for global shuffling in the memory cache dataset, improving data processing capabilities. 🔄
Memory Term Reducer: Augmented the Memory Term Reducer to merge both primitives and groups, enhancing memory management and optimization. 🧩

Bugfix

Convert-memory-placement-to-internal-annotations: Removed a check for single user of an operand, allowing the program to process operands with multiple users. 🔧
LLVM integration: Updated LLVM usage to match the latest commit version, ensuring compatibility and stability. 🛡️
Duplicate dependency in TSL: Removed a duplicate 'clog' dependency, streamlining the code and optimizing dependency management. 🗑️

Chore

Remove unused workflow: Cleaned up the codebase by removing an outdated "A/B Diff Performance Benchmarking" workflow. ✂️

That's all for now! Keep on coding and stay tuned for more updates. Happy coding! 😄

Included Commits

2024-04-15T19:34:02 See commit

This commit integrates StableHLO at openxla/stablehlo@714d9aca. The changes include modifications to the legalize-tf.mlir file, with 8 additions and 8 deletions, resulting in a total of 16 changes. The patch includes updates to functions like ones_like, zeros_like, elu, leaky_relu, leaky_relu_grad, and softsign, where constant_like values are adjusted. Additionally, there are modifications to the temporary.patch file with 565 additions and 1349 deletions, as well as changes to the workspace.bzl file with 2 additions and 2 deletions, totaling 4 changes.

Overall, this commit integrates StableHLO with specific changes to the legalize-tf.mlir file and updates to constant_like values in various functions. It also includes modifications to the temporary.patch and workspace.bzl files, reflecting the integration of StableHLO at the specified commit.

Files changed

tensorflow/compiler/mlir/tf2xla/tests/legalize-tf.mlir
third_party/stablehlo/temporary.patch
third_party/stablehlo/workspace.bzl

2024-04-15T20:46:25 See commit

The commit augments the Memory Term Reducer to merge both primitives and groups. It introduces functions for merging primitives into groups, calculating the number of terms used by primitives or groups, updating primitives after merging, and sweeping through live points to merge large overlaps. The commit includes changes to the auto_sharding_memory.cc file, with modifications to functions such as MergeIntoGroup, CalcNumTerms, UpdatePrimitive, and SweepAndMerge. It also adds tests in the auto_sharding_memory_test.cc file to validate the reduction of terms when merging primitives or groups.

Overall, the commit enhances the Memory Term Reducer by enabling the merging of both primitives and groups, improving the reduction of terms in the process. It introduces new functionalities and updates existing functions to facilitate the merging process, ensuring efficient memory management and optimization. Additionally, the commit includes tests to verify the correctness of the reduction process when merging primitives or groups in various scenarios.

Files changed

third_party/xla/xla/hlo/experimental/auto_sharding/auto_sharding_memory.cc
third_party/xla/xla/hlo/experimental/auto_sharding/auto_sharding_memory_test.cc

2024-04-16T03:56:29 See commit

In this commit, a check for a single user of an operand in the convert_memory_placement_to_internal_annotations function was removed. Previously, if an operand had more than one user, the program would skip processing it. However, this check has been removed in this commit, allowing the program to process operands with multiple users. This change was made in the convert_memory_placement_to_internal_annotations.cc file, with 4 lines of code being deleted in total.

Overall, this commit removes a restriction that prevented the program from processing operands with multiple users in the convert_memory_placement_to_internal_annotations function. This change may improve the functionality of the program by allowing it to handle a wider range of scenarios involving operands with multiple users.

Files changed

third_party/xla/xla/service/convert_memory_placement_to_internal_annotations.cc

2024-04-16T03:58:13 See commit

This commit adds a new op definition called CalibrationStatisticsSaver to replace the CalibrationSingleton by aggregating and saving statistics to files. The op is stateful, designed to run on the CPU, and has no output, making it easy to lift to an outer function when needed. The commit includes changes to various files, such as adding the op definition in calibration_statistics_saver_op.cc, modifying the test cases in calibration_statistics_saver_op_test.cc, and adjusting dependencies in the BUILD files to include the new op.

Additionally, the commit removes the custom_aggregator_op.py file and makes corresponding adjustments in the BUILD file to reflect this change, indicating a shift in functionality related to custom aggregation operations. Overall, the focus of this commit is on introducing the CalibrationStatisticsSaver op for aggregating and saving statistics efficiently.

Files changed

tensorflow/compiler/mlir/quantization/tensorflow/calibrator/BUILD
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/calibration_statistics.proto
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/calibration_statistics_collector_histogram.h
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/calibration_statistics_collector_test.cc
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/calibration_statistics_saver_op.cc
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/calibration_statistics_saver_op_test.cc
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/calibrator_singleton.cc
tensorflow/compiler/mlir/quantization/tensorflow/calibrator/custom_aggregator_op.py
tensorflow/compiler/mlir/quantization/tensorflow/python/BUILD

2024-04-16T18:07:27 See commit

This commit introduces support for global shuffling in the memory cache dataset for TensorFlow data. It includes changes to the cache dataset operations file, adding a new class for caching dataset elements when global shuffling is enabled. The commit also modifies the memory dataset base class to incorporate the new functionality, including a new class for random access caching and handling global shuffling.

In addition, the commit includes updates to the Python kernel tests for cache datasets, adding a new test class for global shuffling. This test class covers various parameters such as dataset range, repetitions, seed, and reshuffle settings to ensure the correct functioning of global shuffling in the cache dataset. Overall, this commit enhances the memory cache dataset functionality in TensorFlow data by adding support for global shuffling, improving data processing capabilities.

Files changed

tensorflow/core/kernels/data/BUILD
tensorflow/core/kernels/data/cache_dataset_ops.cc
tensorflow/python/data/kernel_tests/BUILD
tensorflow/python/data/kernel_tests/cache_test.py

2024-04-16T20:09:19 See commit

This commit implements async dynamic slicing for host memory offloading on GPU. The emitter does not understand dynamic slicing instructions in async computation, so they are wrapped in a fusion node and marked for execution on a different stream. This allows for the offloading of slices to be executed asynchronously. The changes include adding a new function to wrap dynamic slicing instructions into fusion and annotate them with the correct operation queue ID.

The commit also includes modifications to the GPU service code, specifically in the stream attribute annotator, to handle dynamic slice and dynamic update slice instructions. Tests were added to ensure that dynamic slice and dynamic update slice instructions are correctly wrapped in fusion and annotated with the appropriate operation queue ID, allowing for asynchronous execution on the GPU. Overall, this commit enhances the GPU offloading capabilities of the project by enabling async dynamic slicing.

Files changed

third_party/xla/xla/service/gpu/BUILD
third_party/xla/xla/service/gpu/stream_attribute_annotator.cc
third_party/xla/xla/service/gpu/stream_attribute_annotator_test.cc

2024-04-16T20:13:50 See commit

This commit removes an unused workflow named "A/B Diff Performance Benchmarking" from the third_party/xla/.github/workflows/benchmark.yml file. The workflow was no longer relevant and was removed from the codebase. The commit includes the deletion of 56 lines of code related to the workflow, making the necessary changes to remove it completely.

The removal of the unused workflow was necessary to clean up the codebase and eliminate any unnecessary or outdated components. By removing this workflow, the codebase becomes more streamlined and easier to maintain. This commit signifies a proactive approach to managing the codebase and ensuring that only relevant and up-to-date workflows are included in the project.

Files changed

third_party/xla/.github/workflows/benchmark.yml

2024-04-16T22:03:14 See commit

This commit integrates LLVM at the specified commit version, updating the usage of LLVM to match the changes made in the commit 694c444b5bbb. The changes include modifications to the third_party/llvm/workspace.bzl file, with additions and deletions made to align with the updated LLVM commit version.

The commit also includes changes to the LLVM_COMMIT and LLVM_SHA256 values, updating them to reflect the new commit version (694c444b5bbb) and its corresponding SHA256 value. These updates ensure that the integration of LLVM is up to date with the latest changes made in the LLVM project repository.

Files changed

third_party/llvm/workspace.bzl

2024-04-17T02:19:36 See commit

This commit introduces changes to store variable's dtype and shape in IfrtRestoreTensorRegistry and store the IfrtRestoreTensorRegistry in IfrtServingExecutable for looking up the dtype and shape. The changes include modifications to various files such as ifrt_backend_compiler.cc, ifrt_serving_executable.h, ifrt_restore_tensor_registry.cc, ifrt_restore_tensor_registry.h, ifrt_loaded_variable_utils.cc, ifrt_loaded_variable_utils.h, ifrt_loaded_variable_utils_test.cc, ifrt_serving_executable.cc, ifrt_serving_executable.h, ifrt_serving_executable_test.cc, ifrt_ops_kernel.cc, and ifrt_ops_kernel_test.cc.

The modifications involve adding functions to register and retrieve restored tensors in IfrtRestoreTensorRegistry, updating the IfrtServingExecutable to include IfrtRestoreTensorRegistry, and making changes to the kernel operations to handle the loading and restoration of variables. Additionally, tests have been added to ensure the correct functionality of the new implementations.

These changes aim to improve the handling of variable dtype and shape information within the TensorFlow Runtime (TfRT) for more efficient and accurate tensor restoration and lookup during execution.

Files changed

tensorflow/compiler/mlir/tfrt/transforms/ifrt/ifrt_backend_compiler.cc
tensorflow/core/tfrt/ifrt/BUILD
tensorflow/core/tfrt/ifrt/ifrt_executable_registry_test.cc
tensorflow/core/tfrt/ifrt/ifrt_loaded_variable_registry.h
tensorflow/core/tfrt/ifrt/ifrt_loaded_variable_utils.cc
tensorflow/core/tfrt/ifrt/ifrt_loaded_variable_utils.h
tensorflow/core/tfrt/ifrt/ifrt_loaded_variable_utils_test.cc
tensorflow/core/tfrt/ifrt/ifrt_restore_tensor_registry.cc
tensorflow/core/tfrt/ifrt/ifrt_restore_tensor_registry.h
tensorflow/core/tfrt/ifrt/ifrt_serving_executable.cc
tensorflow/core/tfrt/ifrt/ifrt_serving_executable.h
tensorflow/core/tfrt/ifrt/ifrt_serving_executable_test.cc
tensorflow/core/tfrt/mlrt/kernel/BUILD
tensorflow/core/tfrt/mlrt/kernel/ifrt_ops_kernel.cc
tensorflow/core/tfrt/mlrt/kernel/ifrt_ops_kernel_test.cc

2024-04-17T04:10:19 See commit

This commit introduces changes related to weight-only quantization for convolution and dot_general operations. It adds support for the weight_only_ptq method and modifies various files to handle weight-only quantization. The changes include updates to attributes, patterns, passes, and tests to enable weight-only quantization for specific operations based on the Method attribute attached to the XlaCallModuleOp. Additionally, it includes patterns for inserting quantization parameters for weights and ensures that weight-only quantization is applied when the method is specified as weight_only_ptq. The tests verify the correct application of weight-only quantization for convolution and dot_general operations.

Files changed

tensorflow/compiler/mlir/quantization/common/BUILD
tensorflow/compiler/mlir/quantization/common/attrs_and_constraints.cc
tensorflow/compiler/mlir/quantization/common/attrs_and_constraints.h
tensorflow/compiler/mlir/quantization/common/attrs_and_constraints_test.cc
tensorflow/compiler/mlir/quantization/common/lift_as_function_call.h
tensorflow/compiler/mlir/quantization/stablehlo/cc/pass_pipeline.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/insert_weight_param.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/passes.td
tensorflow/compiler/mlir/quantization/stablehlo/passes/quantization_patterns.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/quantization_patterns.h
tensorflow/compiler/mlir/quantization/stablehlo/passes/quantize.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/quantize_composite_functions.cc
tensorflow/compiler/mlir/quantization/stablehlo/tests/passes/insert_weight_param.mlir
tensorflow/compiler/mlir/quantization/stablehlo/tests/passes/quantize/quantize.mlir
tensorflow/compiler/mlir/quantization/stablehlo/tests/passes/quantize/quantize_weight_only.mlir
tensorflow/compiler/mlir/quantization/stablehlo/tests/passes/quantize_composite_functions_weight_only.mlir

2024-04-19T18:01:19 See commit

This commit removes a duplicate dependency in TSL by deleting the 'clog' dependency from the workspace2.bzl file. The removal of this duplicate dependency streamlines the code and ensures that unnecessary dependencies are not included in the project. The commit also includes changes to the _tf_repositories function in the file, specifically removing the 'clog' dependency and adjusting the code accordingly.

Overall, this commit simplifies the dependency management in TSL by removing redundant dependencies and optimizing the codebase. By removing the duplicate 'clog' dependency, the project becomes more efficient and easier to maintain.

Files changed

third_party/xla/third_party/tsl/workspace2.bzl

2024-04-19T20:55:34 See commit

This commit adds containers with CUDA 12.3 and CUDNN 8.9 to the codebase. The Dockerfile is updated to build a manylinux 2014 compliant cross-compiler that targets compatible glibc and system libstdc++. The changes include adding necessary dependencies, updating Python versions, and installing required packages for the specified versions. Additionally, modifications are made to the remote configuration files to include the new container configurations for Ubuntu 22.04 with CUDA 12.3 and CUDNN 8.9, supporting multiple Python versions.

Overall, this commit enhances the codebase by introducing new containers with updated CUDA and CUDNN versions, ensuring compatibility with the specified Python versions and system configurations. The changes made in the Dockerfile and remote configuration files enable the building of manylinux 2014 compliant cross-compilers for the updated environment, expanding the capabilities of the project in terms of GPU-accelerated computing and deep learning tasks.

Files changed

tensorflow/tools/ci_build/Dockerfile.rbe.cuda12.3-cudnn8.9-ubuntu22.04-manylinux2014-multipython
tensorflow/tools/toolchains/remote_config/configs.bzl
tensorflow/tools/toolchains/remote_config/containers.bzl
third_party/xla/third_party/tsl/tools/toolchains/remote_config/configs.bzl