tensorflow changelog

6 months ago

Hey there, fabulous TensorFlow fans! 🎉 Get ready to dive into the latest and greatest updates that are making TensorFlow Lite even more awesome. We've got some cool new features, essential improvements, and a few bug fixes that are smoothing out the ride. Let's see what's new!

Improvement: Enhanced Compiler Plugin API
The compiler plugin API now partitions at the subgraph level instead of the model level. This fine-tunes the association of operations with subgraphs, making the compilation process more precise and efficient. 🚀
Improvement: Improved Model Management
Pre-allocated subgraphs can now be transferred into models, and metadata can be popped from the model's map. This boosts memory management and organization, ensuring smoother model operations. 🧠
Improvement: Model FLOPs Calculations
Model-specific FLOPs are now part of the device operation metrics, providing deeper insights into model performance and helping you optimize better. 📈
New Feature: Per-Channel Quantization in QC Compiler Plugin
The Qualcomm compiler plugin now supports per-channel quantization parameters, boosting flexibility and efficiency for models that need it. 🎛
New Feature: std::any to LiteRtAny Conversion
Introducing conversion between std::any and LiteRtAny, enhancing data handling flexibility in TensorFlow Lite's experimental library. 🔄
New Feature: Per-Tensor Quantization in QNN IR
QNN Intermediate Representation now supports per-tensor quantization, expanding its capabilities for handling diverse models. 📊
New Feature: Open Source TPU Step Utils
Say hello to tpu_step_breakdown_utils and tpu_step_details_utils! These libraries provide detailed breakdowns of TPU performance metrics, helping you optimize your TPU workloads. 🖥
New Feature: HardwareType Combining
Now, when merging RunEnvironment instances, the highest hardware type is selected, ensuring accurate profiling of hardware capabilities. 🖧
Bugfix: Range Analysis Fix
Fixed an issue in operand range multiplication with constants. Now, all components are correctly multiplied, ensuring accurate range analysis. 🔧
Bugfix: Gather Operation Index Clamping
Out-of-bound indices in gather operations are now clamped, preventing execution bugs in SPMD partitioners. 🛠
Bugfix: Build Breakage Fix
Resolved a build issue by aligning data types in flatbuffer tools for Android, ensuring smooth compilation and operation. 🏗

These updates are designed to make your TensorFlow experience smoother, faster, and more powerful. Keep innovating and stay tuned for more exciting updates! 🚀

Included Commits

2024-12-13T01:09:13 See commit

This commit addresses a bug in the PartitionGatherTrivialSlicedOperandDimensions function related to handling out-of-bound indices during gather operations in the SPMD (Single Program Multiple Data) partitioner for both GSPMD and Shardy. The original implementation failed to clamp indices that exceeded the bounds of the operand, resulting in a default outcome of zero when such indices were encountered. This issue did not affect scatter operations, as they do not require index clamping. The fix involves introducing a new function, ClampGatherIndices, which ensures that indices are clamped at the beginning of the PartitionGatherTrivialSlicedOperandDimensions process, thereby preventing the execution bug.

The changes include modifications to the gather_scatter_handler.cc file, where the clamping logic is implemented, and adjustments to the tests in spmd_partitioner_test.cc to validate the correct behavior of the gather operation with clamped indices. The added code ensures that the indices are properly constrained within the valid range based on the dimensions of the operand, thereby enhancing the robustness of the gather operation in the partitioning process.

Files changed

third_party/xla/xla/service/spmd/gather_scatter_handler.cc
third_party/xla/xla/service/spmd/spmd_partitioner_test.cc

2024-12-13T01:31:53 See commit

This commit introduces the calculation of model FLOPs (floating point operations per second) to the device operation metrics within TensorFlow's profiler module. Specifically, it modifies the relevant code files to include the new metric, allowing the system to track model-specific FLOPs alongside general FLOPs. The changes ensure that when operation metrics are set from HLO event metadata, the model FLOPs are accurately recorded and adjusted based on the number of occurrences of each operation.

In the implementation, the code updates the SetOpMetadataFromHloEventMetadata function to include the handling of model FLOPs, and it adjusts the AdjustFlopsAndBytesAccessed function to calculate model FLOPs based on occurrences. This enhancement is significant for performance profiling, as it provides deeper insights into the computational efficiency of models being executed, facilitating better optimization and analysis of TensorFlow applications.

Files changed

tensorflow/core/profiler/convert/xplane_to_op_metrics_db_test.cc
tensorflow/core/profiler/utils/op_metrics_db_utils.cc

2024-12-13T02:24:58 See commit

This commit introduces a new feature that enables the conversion between std::any and a custom type LiteRtAny in the TensorFlow Lite experimental library. The changes include the addition of a new library for litert_any, which encompasses the header file litert_any.h that defines the LiteRtAny structure and its associated type enumeration. The implementation includes functions to convert from LiteRtAny to std::any and vice versa, handling various data types such as boolean, integer, real numbers, strings, and pointers. The new functionality enhances the flexibility of data handling within the library, allowing for easier integration with C++'s type-erased storage.

Additionally, the commit includes modifications to existing build files and test cases to support the new conversion functions. The test cases ensure that the conversions work correctly for all supported types, verifying both the output type and the stored value. This enhancement aims to provide a more robust and type-safe way to manage heterogeneous data types within the TensorFlow Lite framework, ultimately improving the overall usability and functionality of the library.

Files changed

tensorflow/lite/experimental/litert/c/BUILD
tensorflow/lite/experimental/litert/c/litert_any.h
tensorflow/lite/experimental/litert/c/litert_common.h
tensorflow/lite/experimental/litert/c/litert_event.h
tensorflow/lite/experimental/litert/c/litert_model.h
tensorflow/lite/experimental/litert/c/litert_options.h
tensorflow/lite/experimental/litert/cc/BUILD
tensorflow/lite/experimental/litert/cc/litert_any.h
tensorflow/lite/experimental/litert/cc/litert_any_test.cc
tensorflow/lite/experimental/litert/runtime/dispatch/BUILD
tensorflow/lite/experimental/litert/vendors/c/BUILD
tensorflow/lite/experimental/litert/vendors/c/litert_dispatch.h

2024-12-13T02:33:22 See commit

This commit introduces the functionality to combine different hardware types in the CombineRunEnvironment function within TensorFlow's profiling module. Specifically, it ensures that when merging two RunEnvironment instances, if there is a discrepancy in the hardware types, the function selects the highest hardware type. For example, if one instance indicates a TPU or GPU while the other indicates CPU_ONLY, the resulting combined environment will reflect the TPU or GPU as the dominant hardware type. This change is crucial for accurately representing the hardware capabilities in profiling scenarios.

Additionally, the commit includes a new test case to validate this behavior, ensuring that when combining operational statistics from different environments, the resulting hardware type correctly reflects the highest priority hardware. The test confirms that if a coordinator operation is set to CPU_ONLY and a device operation is set to TPU, the combined result will indicate TPU as the hardware type. Overall, these modifications enhance the robustness of the profiling system by accurately accounting for hardware configurations.

Files changed

tensorflow/core/profiler/convert/op_stats_combiner.cc
tensorflow/core/profiler/convert/op_stats_combiner_test.cc

2024-12-13T02:37:31 See commit

This commit introduces support for per-tensor quantization parameters in the QNN Intermediate Representation (IR) within the TensorFlow Lite framework. The changes include the addition of functions that set up quantization parameters, such as scale and offset, for tensors, and legalize these parameters based on the source tensor's properties. The code modifications ensure that the QNN tensor can handle per-tensor quantization correctly, enhancing its functionality and compatibility with various models.

Additionally, the commit updates the test files to include tests for the new quantization functionality. New test cases verify that the quantization parameters are correctly applied and that the QNN tensor behaves as expected when handling quantized models. The updates also involve modifying existing test data references to include relevant test models for comprehensive validation of the new features. Overall, this commit enhances the quantization capabilities of the QNN IR, improving its performance and versatility in handling different tensor types.

Files changed

tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/IR/BUILD
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/IR/qnn_tensor.cc
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/IR/qnn_tensor_test.cc
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/qnn_compiler_plugin_test.cc

2024-12-13T03:01:29 See commit

This commit addresses a bug in the range analysis of operand multiplication with constants in the XLA (Accelerated Linear Algebra) service. The issue was that the step value was not being correctly multiplied when the operand was a constant, leading to incorrect range calculations. The fix involved modifying the logic in the RecursivelyIdentifyRange function to ensure that when multiplying operand ranges with a constant, all components—minimum, maximum, and step—are accurately multiplied by the constant value.

In addition to the code changes, the commit also includes updates to the corresponding unit tests to reflect the corrected behavior. The tests now properly validate the ranges after multiplication, ensuring that the minimum, maximum, and step values are computed as expected. This enhancement not only resolves the existing bug but also strengthens the overall reliability of the range analysis functionality within the XLA service.

Files changed

third_party/xla/xla/service/value_range.cc
third_party/xla/xla/service/value_range_test.cc

2024-12-13T03:36:01 See commit

This commit introduces support for per-channel quantization parameters within the Qualcomm compiler plugin for TensorFlow Lite. It modifies several files to accommodate this new functionality, including the addition of functions to handle per-channel quantization. Specifically, the SetPerChannelQuantization function is implemented to set the quantization parameters for tensors based on the number of channels and their respective scales and zero points. Additionally, a new function, FreePerChannelQuantization, is added to manage memory allocation for these parameters, ensuring that the system can handle the increased complexity of per-channel quantization without memory leaks.

Furthermore, the commit includes updates to the testing framework to validate the correct implementation of per-channel quantization. A new test case, TestLegalizeTensor, is created to check if the per-channel quantized tensors are processed correctly, confirming that the quantization parameters are set as expected. Overall, this change enhances the flexibility and efficiency of the TensorFlow Lite framework, particularly for models that benefit from per-channel quantization techniques.

Files changed

tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/IR/BUILD
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/IR/qnn_tensor.cc
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/IR/qnn_tensor_test.cc
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/qnn_compiler_plugin_test.cc

2024-12-13T03:36:55 See commit

This commit introduces two new libraries, tpu_step_breakdown_utils and tpu_step_details_utils, to the TensorFlow profiler utility module, enhancing the functionality for analyzing TPU (Tensor Processing Unit) performance metrics. The tpu_step_breakdown_utils library provides a set of inline functions to calculate various durations related to TPU operations, such as infeed and outfeed times, compute durations, and wait times for host or SparseCoreV0. These utilities aim to facilitate a more detailed breakdown of TPU step performance, allowing developers to better understand the timing and efficiency of their TPU workloads.

Additionally, the tpu_step_details_utils library includes functions to compute overall execution times for TPU steps, including compute, infeed, and all-reduce times, as well as non-idle times. This commit enhances the profiling capabilities of TensorFlow, providing developers with more granular insights into TPU operation durations, which can be crucial for optimizing performance in machine learning tasks. Overall, this update improves the tools available for performance analysis in TPU environments.

Files changed

tensorflow/core/profiler/utils/BUILD
tensorflow/core/profiler/utils/tpu_step_breakdown_utils.h
tensorflow/core/profiler/utils/tpu_step_details_utils.h

2024-12-13T05:26:03 See commit

This commit addresses a build issue in the TensorFlow Lite experimental library related to the data types used in the flatbuffer tools for Android. Specifically, it modifies the type of the second element in the TflPerChannelQParams tuple from uint64_t to size_t. This change is significant as it aligns the data type with the expected size of the zero-point vector, ensuring that the code compiles correctly and operates as intended.

In the flatbuffer_tools.cc file, the return statement for the AsPerChannelQparams function has been updated to create a TflPerChannelQParams object instead of using std::make_tuple. This change, along with minor adjustments in the header file, improves type consistency and enhances the clarity of the code. Overall, the commit resolves the build breakage while maintaining the functionality of the flatbuffer tools.

Files changed

tensorflow/lite/experimental/litert/core/util/flatbuffer_tools.cc
tensorflow/lite/experimental/litert/core/util/flatbuffer_tools.h

2024-12-13T05:49:02 See commit

The recent commit updates the TensorFlow Lite compiler plugin API to allow partitioning at the subgraph level instead of the model level. This change enhances the ability to associate selected operations with their respective parent subgraphs, improving the granularity and specificity of the compilation process. The modifications involve renaming functions and adjusting their parameters to accept subgraphs, which simplifies the interface for plugin developers and enables better management of operations during the compilation phase.

Additionally, the commit includes updates to various related files, such as tests and example plugins, to reflect the new partitioning approach. The changes ensure that the API is more aligned with the internal architecture of TensorFlow Lite, which can now handle multiple subgraphs within a model more effectively. The commit also introduces a new test case to verify the functionality of the updated partitioning method, ensuring that the system operates correctly with the new structure.

Files changed

tensorflow/lite/experimental/litert/compiler/plugin/compiler_plugin.cc
tensorflow/lite/experimental/litert/compiler/plugin/compiler_plugin.h
tensorflow/lite/experimental/litert/compiler/plugin/compiler_plugin_test.cc
tensorflow/lite/experimental/litert/test/testdata/multi_subgraph_mul.mlir
tensorflow/lite/experimental/litert/tools/apply_plugin.cc
tensorflow/lite/experimental/litert/vendors/c/BUILD
tensorflow/lite/experimental/litert/vendors/c/litert_compiler_plugin.h
tensorflow/lite/experimental/litert/vendors/c/litert_compiler_plugin_api.h
tensorflow/lite/experimental/litert/vendors/examples/example_plugin.cc
tensorflow/lite/experimental/litert/vendors/examples/example_plugin_test.cc
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/qnn_compiler_plugin.cc
tensorflow/lite/experimental/litert/vendors/qualcomm/compiler/qnn_compiler_plugin_test.cc

2024-12-13T07:24:07 See commit

This commit introduces enhancements to the LiteRtModelT class in TensorFlow Lite's experimental LiteRT module. It adds functionality to transfer pre-allocated subgraphs into the model and introduces a method to pop metadata from the model's metadata map. The PopMetadata method allows the model to remove and take ownership of metadata associated with a given key, returning an expected type that indicates success or failure. Additionally, the TransferSubgraphs method facilitates the transfer of subgraphs into the model, improving memory management and organization.

The commit also includes corresponding unit tests to validate the new functionalities. A test for the PopMetadata method checks that metadata can be successfully removed and retrieved, ensuring that the model behaves as expected when interacting with its metadata. Overall, these changes aim to enhance the model's capabilities in managing subgraphs and metadata, contributing to more efficient model operations within the TensorFlow Lite framework.

Files changed

tensorflow/lite/experimental/litert/core/model/model.h
tensorflow/lite/experimental/litert/core/model/model_test.cc