Release Intel® Extension for PyTorch* v2.8.10+xpu Release Notes · intel/intel-extension-for-pytorch

We launched Intel® Extension for PyTorch* in 2020 with the goal of extending the official PyTorch* to simplify achieving high performance on Intel® CPU and GPU platforms. Over the years, we have successfully upstreamed most of our features and optimizations for Intel® platforms into PyTorch*. Moving forward, our strategy is to focus on developing new features and supporting upcoming platform launches directly within PyTorch*. We are discontinuing active development on Intel® Extension for PyTorch*, effective immediately after 2.8 release. We will continue to provide critical bug fixes and security patches throughout the PyTorch* 2.9 timeframe to ensure a smooth transition for our partners and the community.

2.8.10+xpu

Intel® Extension for PyTorch* v2.8.10+xpu is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch* 2.8.0.

Highlights

Intel® oneDNN v3.8.1 integration
Intel® Deep Learning Essentials 2025.1.3 compatibility
Large Language Model (LLM) optimization

Intel® Extension for PyTorch* optimizes the performance of Qwen3, along with other typical LLM models on Intel® GPU platforms，with the supported transformer version upgraded to 4.51.3. A full list of optimized LLM models is available in the LLM Optimizations Overview. Intel® Extension for PyTorch* also adds the support for more custom kernels, such as selective_scan_fn, causal_conv1d_fn and causal_conv1d_update, for the functionality support of Jamba model.
PyTorch* XCCL adoption for distributed scenarios

Intel® Extension for PyTorch* adopts the PyTorch* XCCL backend for distributed scenarios on the Intel® GPU platform. We observed that the scaling performance using PyTorch* XCCL is on par with OneCCL Bindings for PyTorch* (torch-ccl) for validated AI workloads. As a result, we will discontinue active development of torch-ccl immediately after the 2.8 release.

A pseudocode example illustrating the transition from torch-ccl to PyTorch* XCCL at the model script level is shown below:
```
import torch

if torch.distributed.is_xccl_available:
  torch.distributed.init_process_group(backend='xccl')
else:
  import oneccl_bindings_for_pytorch
  torch.distributed.init_process_group(backend='ccl')      
```
Redundant code removal

Intel® Extension for PyTorch* no longer overrides the device allocator. It is recommended to use the allocator provided by PyTorch* instead. Intel® Extension for PyTorch* also removes all overridden oneMKL and oneDNN related operators except GEMM and SDPA.

Known Issues

Please refer to Known Issues webpage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intel® Extension for PyTorch* v2.8.10+xpu Release Notes

2.8.10+xpu

Highlights

Known Issues

Uh oh!