tensorflow changelog

2 years ago

### Changelog

Hey there, awesome developers! We've got some exciting updates and fixes for you. Check out what's new and improved:

#### New feature 🚀
- **PluginProgram in IFRT**: Introducing the 'PluginProgram' in IFRT, now accessible via `xla_client.compile_ifrt_program()`. This nifty feature wraps arbitrary byte-strings, giving IFRT backends the freedom to interpret them as they see fit. Plus, new functions to create XLA and plugin programs and compile options are now available.
- **Distributed Save and Load with Wait**: Say hello to `data.experimental.distributed_save` and the `wait` parameter in `load`! Save your distributed dataset snapshots non-blockingly and read them while they're being written. Backward compatibility? Check!
- **Executable Wrapper for Host Callback**: Added a new C++ class `TfHostCallback` to run host callbacks in TensorFlow. Create, pass input tensors, execute, and retrieve output tensors with ease.
- **Force Early Scheduling**: Introducing `kForceEarly` to schedule nodes as early as possible, especially useful for GPU schedulers. Optimize your pipelined Recv nodes for better performance.
- **Get Default Layout in PyClient**: Added a method to retrieve the default layout for specific devices in the PyClient class. More control over your layouts now!

#### Improvement 🌟
- **Same Shape Bias for Convolution**: Lift the same shape bias for `stablehlo.convolution`. Explicitly give bias with the desired shape, and find operands of specific types with ease.
- **SourceLocation in xla::Internal Errors**: Enhanced error reporting and debugging by adding SourceLocation information to xla::Internal errors.
- **Rename WeightOnlyPreset**: Updated the naming convention from WeightOnlyPreset to WeightOnlyPtqPreset for clarity and uniformity across the codebase.

#### Bugfix 🐛
- **Rollforward with Fix**: Resolved issues in "hlo_proto_to_memory_visualization_utils.cc" by rolling forward with necessary fixes. Shape indexes and descriptions are now accurately resolved.
- **Fake Quant Gradient Ops**: Registered fake quant gradient operations as not differentiable to maintain consistency and accuracy in gradient computations.
- **Async Copies Scheduling**: Corrected the scheduling of async copy operations with `start_after=-1` to hide latency effectively.

#### Chore 🧹
- **Remove Stray Constant Folding Mutex**: Cleaned up and optimized the constant folding logic by removing an unnecessary mutex, resulting in more efficient code execution.

Enjoy these updates and keep on coding! 🚀✨

Included Commits

2024-04-01T20:42:17 See commit

This commit addresses an issue where async copy operations with a start_after value of -1 were not being scheduled at the earliest point in the program, leading to latency not being hidden. The previous implementation started the counter at 0 and only checked for async copies scheduled exactly before and after the counter. By adjusting the counter to start at -1, the async copy operations with start_after set to -1 are now correctly scheduled at the beginning of the program, ensuring that latency is hidden as intended.

The changes made in this commit involve modifying the MemorySpaceAssignment::FixSchedule function to account for async copies with start_after values of -1 and adjusting the loop conditions to handle these cases correctly. Additionally, a test case HoistCopyStart was added to verify that async copy operations with start_after=-1 are now being scheduled at the very beginning of the program, ensuring the correct behavior of the scheduling algorithm.

Files changed

third_party/xla/xla/service/memory_space_assignment/memory_space_assignment.cc
third_party/xla/xla/service/memory_space_assignment/memory_space_assignment_test.cc

2024-04-02T19:19:57 See commit

This commit introduces the 'PluginProgram' in IFRT and exposes it in Python via xla_client.compile_ifrt_program(). The 'PluginProgram' is a wrapper for arbitrary byte-strings, allowing an IFRT backend to interpret the byte-string in any way it sees fit. The commit also adds new functions in the ifrt_programs module to create XLA and plugin programs, as well as compile options for these programs.

Additionally, the commit includes changes to various files to implement the new functionality, such as adding new classes for 'PluginProgram' and 'PluginCompileOptions', updating the PyClient class to handle IFRT programs, and adding test cases to verify the behavior of compiling and executing XLA programs via IFRT programs.

Files changed

third_party/xla/xla/python/BUILD
third_party/xla/xla/python/ifrt/BUILD
third_party/xla/xla/python/ifrt/plugin_program.cc
third_party/xla/xla/python/ifrt/plugin_program.h
third_party/xla/xla/python/ifrt/plugin_program_serdes.cc
third_party/xla/xla/python/ifrt/plugin_program_serdes_test.cc
third_party/xla/xla/python/py_client.cc
third_party/xla/xla/python/py_client.h
third_party/xla/xla/python/py_program.cc
third_party/xla/xla/python/py_program.h
third_party/xla/xla/python/xla.cc
third_party/xla/xla/python/xla_client.py
third_party/xla/xla/python/xla_client.pyi
third_party/xla/xla/python/xla_client_test.py
third_party/xla/xla/python/xla_extension/__init__.pyi
third_party/xla/xla/python/xla_extension/ifrt_programs.pyi

2024-04-02T23:39:30 See commit

In this commit, a stray constant folding mutex was removed from the code in constant_fold.cc. The mutex was used to avoid overlapping folds with the same context, but it was deemed unnecessary and removed. This change involved deleting four lines of code that implemented the mutex and its lock, simplifying the code and potentially improving performance by removing unnecessary synchronization.

The commit indicates a cleanup and optimization of the constant folding logic by removing unnecessary synchronization mechanisms. By removing the mutex, the code is streamlined and made more efficient, potentially leading to better performance when constant folding operations are executed. This change was made to improve the codebase and make it more maintainable by eliminating unnecessary components that do not contribute to the functionality of the constant folding process.

Files changed

tensorflow/compiler/mlir/tensorflow/transforms/constant_fold.cc

2024-04-03T00:35:29 See commit

This commit registers fake quant gradient operations as not differentiable in TensorFlow. Specifically, the commit modifies the array_grad.cc file to add registrations for FakeQuantWithMinMaxArgsGradient, FakeQuantWithMinMaxVarsGradient, and FakeQuantWithMinMaxVarsPerChannelGradient as not differentiable operations. Additionally, the pywrap_gradient_exclusions.cc file is updated to include these fake quant gradient operations in the list of operations with no gradients, ensuring they are treated as not differentiable during computation. Finally, the array_grad.py file is modified to include the relevant operations as not differentiable using the ops.NotDifferentiable function.

Overall, this commit ensures that fake quant gradient operations are recognized as not differentiable in TensorFlow, allowing for proper handling and computation within the framework. By registering these operations as not differentiable, the commit helps maintain consistency and accuracy in the computation of gradients for these specific operations, ensuring they are treated appropriately during the training process.

Files changed

tensorflow/core/ops/array_grad.cc
tensorflow/python/eager/pywrap_gradient_exclusions.cc
tensorflow/python/ops/array_grad.py

2024-04-03T18:05:39 See commit

This commit involves a rollforward with a fix that reverts a previous commit identified by its unique identifier. The changes made in this commit affect the file "hlo_proto_to_memory_visualization_utils.cc" within the "profiler/convert" directory of the TensorFlow core. The modifications include adding 16 lines of code, deleting 11 lines, and changing a total of 27 lines. The changes involve resolving shape indexes and descriptions, as well as defining logical buffer structures based on certain parameters.

Specifically, the code now includes a function to resolve shape indexes using the last subshape to maintain historical behavior. Additionally, there is a function to describe the shape of a given structure based on certain parameters. The logical buffer structure has been updated to include the shape of the logical buffer based on instructions and allocations, improving the overall functionality and clarity of the code in this file.

Files changed

tensorflow/core/profiler/convert/hlo_proto_to_memory_visualization_utils.cc

2024-04-03T19:57:34 See commit

The commit adds SourceLocation information to xla::Internal errors in various files within the xla library. Specifically, changes were made to hlo_sharding_util.cc, ir_emitter.cc, ir_emitter_unnested.cc, hlo_verifier.cc, layout_assignment.cc, memory_space_assignment.cc, and util.h. In hlo_sharding_util.cc, ir_emitter.cc, ir_emitter_unnested.cc, and hlo_verifier.cc, the lambda functions now return absl::Status. In layout_assignment.cc, the lambda functions now return absl::Status, and in memory_space_assignment.cc, a lambda function now returns absl::Status. Additionally, in util.h, the macro for defining XLA_ERROR_WITH_STRFORMAT_AND_BACKTRACE(Internal) was added to handle Internal errors with SourceLocation information. The addition of SourceLocation information to xla::Internal errors enhances error reporting and debugging capabilities within the xla library.

Files changed

third_party/xla/xla/hlo/utils/hlo_sharding_util.cc
third_party/xla/xla/service/cpu/ir_emitter.cc
third_party/xla/xla/service/gpu/ir_emitter_unnested.cc
third_party/xla/xla/service/hlo_verifier.cc
third_party/xla/xla/service/layout_assignment.cc
third_party/xla/xla/service/memory_space_assignment/memory_space_assignment.cc
third_party/xla/xla/util.h

2024-04-03T19:57:36 See commit

This commit introduces support for data.experimental.distributed_save and adds the wait parameter to load in TensorFlow's tf.data module. The distributed_save function uses the tf.data service to write distributed dataset snapshots, which is non-blocking and returns without waiting for the snapshot to finish. Setting wait=True in tf.data.Dataset.load allows the snapshots to be read while they are being written. The default value for wait is False for backward compatibility, and an error will be raised if the requested snapshot does not exist.

Additionally, the commit includes changes to various files, such as adding new methods, updating arguments in existing methods, and modifying API documentation to reflect the new functionality. The changes aim to enhance the functionality and usability of distributed dataset snapshots in TensorFlow's tf.data module.

Files changed

RELEASE.md
tensorflow/python/data/experimental/BUILD
tensorflow/python/data/experimental/__init__.py
tensorflow/python/data/experimental/kernel_tests/service/distributed_save_load_ft_test.py
tensorflow/python/data/experimental/kernel_tests/service/distributed_save_load_test.py
tensorflow/python/data/experimental/ops/BUILD
tensorflow/python/data/experimental/ops/distributed_save_op.py
tensorflow/python/data/kernel_tests/io_test.py
tensorflow/python/data/ops/dataset_ops.py
tensorflow/python/data/ops/load_op.py
tensorflow/tools/api/golden/v1/tensorflow.data.-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.-fixed-length-record-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.-t-f-record-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.-text-line-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.experimental.-csv-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.experimental.-random-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.experimental.-sql-dataset.pbtxt
tensorflow/tools/api/golden/v1/tensorflow.data.experimental.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.-fixed-length-record-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.-t-f-record-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.-text-line-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.experimental.-csv-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.experimental.-random-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.experimental.-sql-dataset.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.data.experimental.pbtxt
tensorflow/tools/api/golden/v2/tensorflow.experimental.dtensor.-d-tensor-dataset.pbtxt

2024-04-04T03:54:06 See commit

The commit renames the WeightOnlyPreset to WeightOnlyPtqPreset in various files within the TensorFlow compiler and MLIR directories. This change is reflected in the quantization configuration proto file, quantization Python script, and integration test files. The update involves modifying the code to use the new naming convention for weight-only post-training quantization presets, ensuring consistency across the codebase. This adjustment aims to improve clarity and maintain uniformity in the naming conventions used for different quantization presets.

Additionally, the commit includes updates to the RunQuantization function in the quantization.cc file, the quantize_model_test.py integration test file, and the quantization.py script. These modifications involve replacing references to the previous WeightOnlyPreset with the new WeightOnlyPtqPreset, aligning with the renaming changes made in the quantization configuration proto file. By updating the function calls and configurations to use the new naming convention, the commit ensures that the codebase remains up-to-date and follows the latest conventions for weight-only post-training quantization presets in TensorFlow.

Files changed

tensorflow/compiler/mlir/lite/quantization/stablehlo/quantization.cc
tensorflow/compiler/mlir/quantization/stablehlo/python/integration_test/quantize_model_test.py
tensorflow/compiler/mlir/quantization/stablehlo/python/quantization.py
tensorflow/compiler/mlir/quantization/stablehlo/quantization_config.proto

2024-04-04T06:41:48 See commit

This commit adds support for lifting the same shape bias for stablehlo.convolution. Previously, it was common to write models where a 1D constant would be broadcasted, but now it is possible to explicitly give bias with the desired shape. Examples in odml_coverage_test demonstrate handling cases where the bias has the same shape as the target accumulation. Additionally, FindOperandType is added for finding an operand of a specific type and respective tests are included.

The changes include modifications to uniform_quantized_stablehlo_to_tfl_pass.cc with additions, deletions, and changes to support lifting the same shape bias. Functions like GetBiasConstOp and FindOperandOfType are implemented to handle bias operations with the desired shape. Tests for finding users and operands of different types are added in attrs_and_constraints_test.cc and lift_as_function_call_test.cc. Patterns for lifting convolution operations with the same shape bias are defined in lift_quantizable_spots_as_functions_fusion.td.

Files changed

tensorflow/compiler/mlir/lite/stablehlo/transforms/uniform_quantized_stablehlo_to_tfl_pass.cc
tensorflow/compiler/mlir/quantization/common/BUILD
tensorflow/compiler/mlir/quantization/common/attrs_and_constraints.h
tensorflow/compiler/mlir/quantization/common/attrs_and_constraints_test.cc
tensorflow/compiler/mlir/quantization/common/lift_as_function_call_test.cc
tensorflow/compiler/mlir/quantization/common/test_base.h
tensorflow/compiler/mlir/quantization/stablehlo/ops/BUILD
tensorflow/compiler/mlir/quantization/stablehlo/ops/stablehlo_op_quant_spec_test.cc
tensorflow/compiler/mlir/quantization/stablehlo/passes/lift_quantizable_spots_as_functions_fusion.td
tensorflow/compiler/mlir/quantization/tensorflow/cc/BUILD
tensorflow/compiler/mlir/quantization/tensorflow/cc/constant_fold_test.cc

2024-04-04T17:43:37 See commit

This commit adds a new method called kForceEarly to express the desire to schedule a node as early as possible, similar to the existing kForceDelay method which schedules a node as late as possible. The purpose of this new method is to be used in the GPU scheduler, specifically for scheduling pipelined Recv nodes close to RecvDone nodes so that unnecessary copies of RecvDone can be removed. This change involves modifying the GpuAsyncTrackerBase class in gpu_hlo_schedule.cc to set the ForceEarly attribute for certain instructions, as well as updating the ReadySetLt class in latency_hiding_scheduler.cc to consider the ForceEarly attribute when choosing the best scheduling candidate.

Additionally, the HloGraphNode class in latency_hiding_scheduler.h is updated to include methods for getting and setting the ForceEarly attribute, along with modifications to the ToString method to display the ForceEarly attribute. Overall, this commit introduces a new method for scheduling nodes early in the context of GPU scheduling, aiming to optimize the scheduling of pipelined Recv nodes for better performance.

Files changed

third_party/xla/xla/service/gpu/gpu_hlo_schedule.cc
third_party/xla/xla/service/latency_hiding_scheduler.cc
third_party/xla/xla/service/latency_hiding_scheduler.h

2024-04-04T19:12:59 See commit

This commit adds an executable wrapper to run a host callback in TensorFlow. The wrapper includes a new C++ class TfHostCallback with methods to create and call a TensorFlow function, as well as functions to handle input and output tensors. The wrapper also includes tests for the wrapper functionality, such as running a simple function and sharing state between multiple host callbacks.

The TfHostCallback class allows for creating a TensorFlow function, passing input tensors, executing the function, and retrieving output tensors. The tests demonstrate the functionality of the wrapper by running simple functions and sharing state between callbacks. The commit also includes header and source files for the TfHostCallback class, as well as test files to verify its functionality.

Files changed

tensorflow/core/tfrt/ifrt/BUILD
tensorflow/core/tfrt/ifrt/tf_host_callback.cc
tensorflow/core/tfrt/ifrt/tf_host_callback.h
tensorflow/core/tfrt/ifrt/tf_host_callback_test.cc

2024-04-05T23:35:32 See commit

This commit adds a new method to the PyClient class in the PyClient.cc file called "get_default_layout". This method takes in parameters such as dtype, shard_shape, and device, and returns a unique pointer to PjRtLayout. The method retrieves the default layout for a specific device based on the input parameters. Additionally, there are changes made to other files such as xla_client.py and xla_extension/init.pyi to accommodate this new method.

Overall, this commit introduces a new method in the PyClient class that allows users to retrieve the default layout for a specific device. This enhancement expands the functionality of the PyClient class in the PyClient.cc file and requires corresponding changes in other related files to support this new method.

Files changed

third_party/xla/xla/python/py_client.cc
third_party/xla/xla/python/xla_client.py
third_party/xla/xla/python/xla_extension/__init__.pyi