tensorflow changelog

Get notified when there are new updates

1 year ago

Hey there, code wranglers! We've got a bunch of updates to share with you. From new features to bug fixes, here's the latest scoop on what's been happening under the hood. 🚀

New feature

Containers with CUDA 12.3 and CUDNN 8.9: Added new containers with CUDA 12.3 and CUDNN 8.9. This update makes sure you can build manylinux 2014 compliant cross-compilers targeting compatible glibc and system libstdc++. 🚀
Weight-only quantization: Introduced weight-only quantization for convolution and dot_general operations. This adds support for the weight_only_ptq method, making your deep learning models leaner and meaner. 🏋️‍♂️
CalibrationStatisticsSaver op: Added a new op definition to replace the CalibrationSingleton, aggregating and saving statistics to files. This op is stateful and designed to run on the CPU, making it easy to lift to outer functions. 📊
Async dynamic slicing: Implemented async dynamic slicing for host memory offloading on GPU. Dynamic slicing instructions are wrapped in a fusion node, allowing for asynchronous execution. 🌀
StableHLO integration: Integrated StableHLO at openxla/stablehlo@714d9aca, updating various functions and constants. 🛠️

Improvement

Variable dtype and shape storage: Enhanced IfrtRestoreTensorRegistry to store variable dtype and shape, improving tensor restoration and lookup during execution. 🧠
Global shuffling for memory cache dataset: Added support for global shuffling in the memory cache dataset, improving data processing capabilities. 🔄
Memory Term Reducer: Augmented the Memory Term Reducer to merge both primitives and groups, enhancing memory management and optimization. 🧩

Bugfix

Convert-memory-placement-to-internal-annotations: Removed a check for single user of an operand, allowing the program to process operands with multiple users. 🔧
LLVM integration: Updated LLVM usage to match the latest commit version, ensuring compatibility and stability. 🛡️
Duplicate dependency in TSL: Removed a duplicate 'clog' dependency, streamlining the code and optimizing dependency management. 🗑️

Chore

Remove unused workflow: Cleaned up the codebase by removing an outdated "A/B Diff Performance Benchmarking" workflow. ✂️

That's all for now! Keep on coding and stay tuned for more updates. Happy coding! 😄

2 years ago

Here's the latest and greatest from our development team! Check out the awesome new features, improvements, and bug fixes we've rolled out:

New Features

IndexFlatMapDataset 🎉
- Introducing IndexFlatMapDataset, a new dataset operation in TensorFlow. It's like flat_map but with global shuffling! Users need to provide an index_map_fn function, which returns a tuple of (element index, offset) for the unflattened dataset. Enhances dataset manipulation with global shuffling support.
Unbounded Dynamism Tests 🧪
- Added tests for unbounded dynamism in ReducePrecisionOp, ShiftLeftOp, and ComplexOp. These tests ensure that these operations handle precision reduction, shifting, and complex number operations correctly, even with varying shapes and broadcast dimensions.
IfrtServingExecutable Host Callback Execution 🚀
- Added support for executing host callbacks in IfrtServingExecutable. This includes building, grouping, and executing host callbacks synchronously, along with necessary tests to ensure functionality.

Improvements

Unpack Quantized MHLO Ops 🔧
- Unpacked per-channel hybrid quantized MHLO ops to float ops. This includes extensive modifications and tests to ensure correct handling of scales and zero points in symmetric and asymmetric quantization cases.
Composite Lowering for aten.avg_pool2d 🌊
- Added a composite lowering pass for aten.avg_pool2d in the TensorFlow compiler MLIR Lite stablehlo module. This includes utility functions and updates to various files to handle average pooling operations.
Global Shuffling for IndexFlatMapDataset 🌐
- Enhanced IndexFlatMapDataset with global shuffling support. This includes updates to ensure compatibility with random access for all upstream transformations and new test cases to validate the functionality.

Bug Fixes

PjRtBuffer Dependency Handling 🛠️
- Updated DonateWithControlDependency in PjRtBuffer to use PjRtFuture<> for passing dependencies. This includes temporary adaptor functions and changes across multiple files to ensure compatibility.
HloComputation Struct Optimization 🏋️‍♂️
- Removed the redundant instruction_indices_ from HloComputation, reducing the struct size and reorganizing it for better efficiency.
Attribute Fix for MSVC 🔩
- Replaced __attribute__((unused)) with [[maybe_unused]] in PluginProgramSerDes and PluginCompileOptionsSerDes to fix an MSVC error.

Chores

Internal Package Group Update 📦
- Modified the internal package group in the tensorflow/BUILD file, adding a new package group for "//waymo/accelerator/...". This helps in better organizing and managing the codebase.

Stay tuned for more updates and keep coding! 🚀

2 years ago

### Changelog

Hey there, awesome developers! We've got some exciting updates and fixes for you. Check out what's new and improved:

#### New feature 🚀
- **PluginProgram in IFRT**: Introducing the 'PluginProgram' in IFRT, now accessible via `xla_client.compile_ifrt_program()`. This nifty feature wraps arbitrary byte-strings, giving IFRT backends the freedom to interpret them as they see fit. Plus, new functions to create XLA and plugin programs and compile options are now available.
- **Distributed Save and Load with Wait**: Say hello to `data.experimental.distributed_save` and the `wait` parameter in `load`! Save your distributed dataset snapshots non-blockingly and read them while they're being written. Backward compatibility? Check!
- **Executable Wrapper for Host Callback**: Added a new C++ class `TfHostCallback` to run host callbacks in TensorFlow. Create, pass input tensors, execute, and retrieve output tensors with ease.
- **Force Early Scheduling**: Introducing `kForceEarly` to schedule nodes as early as possible, especially useful for GPU schedulers. Optimize your pipelined Recv nodes for better performance.
- **Get Default Layout in PyClient**: Added a method to retrieve the default layout for specific devices in the PyClient class. More control over your layouts now!

#### Improvement 🌟
- **Same Shape Bias for Convolution**: Lift the same shape bias for `stablehlo.convolution`. Explicitly give bias with the desired shape, and find operands of specific types with ease.
- **SourceLocation in xla::Internal Errors**: Enhanced error reporting and debugging by adding SourceLocation information to xla::Internal errors.
- **Rename WeightOnlyPreset**: Updated the naming convention from WeightOnlyPreset to WeightOnlyPtqPreset for clarity and uniformity across the codebase.

#### Bugfix 🐛
- **Rollforward with Fix**: Resolved issues in "hlo_proto_to_memory_visualization_utils.cc" by rolling forward with necessary fixes. Shape indexes and descriptions are now accurately resolved.
- **Fake Quant Gradient Ops**: Registered fake quant gradient operations as not differentiable to maintain consistency and accuracy in gradient computations.
- **Async Copies Scheduling**: Corrected the scheduling of async copy operations with `start_after=-1` to hide latency effectively.

#### Chore 🧹
- **Remove Stray Constant Folding Mutex**: Cleaned up and optimized the constant folding logic by removing an unnecessary mutex, resulting in more efficient code execution.

Enjoy these updates and keep on coding! 🚀✨

Showing 41 to 43 of 43 Entries