Release Intel® Extension for PyTorch* v2.8.0+cpu Release Notes · intel/intel-extension-for-pytorch

2.8.0

We are excited to announce the release of Intel® Extension for PyTorch* 2.8.0+cpu which accompanies PyTorch 2.8. This release mainly brings you new LLM model optimization including Qwen3 and Whisper large-v3, enhancement of API for multi-LoRA inference kernels and optimizations of LLM generation sampler. This release also includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions.

Besides providing optimization in Intel® Extension for PyTorch*, over the past years, we have also upstreamed most of our features and optimizations for Intel® platforms into PyTorch* and will continue pushing remaining ones into PyTorch* in future. Moving forward, we will change our working model to prioritize developing new features and optimization directly in PyTorch*, and de-prioritize development in Intel® Extension for PyTorch*, effective after 2.8 release. We will continue providing critical bug fixes and security patches if needed throughout the PyTorch* 2.9 timeframe to ensure a smooth transition for our partners and community.

Highlights

Qwen3 support

Qwen3 has recently been released, the latest addition to the Qwen family of large language models. Intel® Extension for PyTorch* provides support of Qwen3 since its launch date with early release version for MoE models like Qwen3-30B and middle-size dense model like Qwen3-14B. Related optimizations have been included in this official release.

Whisper large-v3 support

Intel® Extension for PyTorch* provides optimization for whisper-large-v3, a state-of-the-art model for automatic speech recognition (ASR) and speech translation. Key improvements include replacing the cross-attention mechanism with the Indirect Access Key-Value (IAKV) Cache kernel, bringing you well-performing experience with weight-only INT8 quantization on Intel® Xeon® processors.

General Large Language Model (LLM) optimization

Intel® Extension for PyTorch* provides sgmv support in the API for multi-LoRA inference kernels for LLM serving frameworks and optimizes the LLM generation sampler. A full list of optimized models can be found at LLM optimization.

Bug fixing and other optimization
- Optimized the performance of LLM #3688 #3708 #3754
- Removed the dependency on torch-ccl and oneCCL #3690

Full Changelog: v2.7.0+cpu...v2.8.0+cpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intel® Extension for PyTorch* v2.8.0+cpu Release Notes

2.8.0

Highlights

Uh oh!