Haopeng Li1, Jinyue Yang2, Guoqi Li2,📧, Huan Wang1,📧
1 Westlake University, 2 Institute of Automation, Chinese Academy of Sciences
- 2025-03-27: Add HuggingFace integration to ARPG.
- 2025-03-25: Add the sampling arccos schedule.
- 2025-03-14: The paper and code are released!
We introduce a novel autoregressive image generation framework named ARPG. This framework is capable of conducting BERT-style masked modeling by employing a GPT-style causal architecture. Consequently, it is able to generate images in parallel following a random token order and also provides support for the KV cache.
- 💪 ARPG achieves an FID of 1.94
- 🚀 ARPG delivers throughput 26 times faster than LlamaGen—nearly matching VAR
- ♻️ ARPG reducing memory consumption by over 75% compared to VAR.
- 🔍 ARPG supports zero-shot inference (e.g., inpainting and outpainting).
- 🛠️ ARPG can be easily extended to controllable generation.
We provide the model weights pre-trained on ImageNet-1K 256*256.
Model | Param | Schedule | CFG | Step | FID | IS | Weight |
---|---|---|---|---|---|---|---|
ARPG-L | 320 M | cosine | 4.5 | 64 | 2.44 | 292 | arpg_300m.pt |
ARPG-XL | 719 M | cosine | 6.0 | 64 | 2.10 | 331 | arpg_700m.pt |
ARPG-XXL | 1.3 B | cosine | 7.5 | 64 | 1.94 | 340 | arpg_1b.pt |
You can easily play ARPG using the HuggingFace DiffusionPipeline
.
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("hp-l33/ARPG", custom_pipeline="hp-l33/ARPG")
class_labels = [207, 360, 388, 113, 355, 980, 323, 979]
generated_image = pipeline(
model_type="ARPG-XL", # choose from 'ARPG-L', 'ARPG-XL', or 'ARPG-XXL'
seed=0, # set a seed for reproducibility
num_steps=64, # number of autoregressive steps
class_labels=class_labels, # provide valid ImageNet class labels
cfg_scale=4, # classifier-free guidance scale
output_dir="./images", # directory to save generated images
cfg_schedule="constant", # choose between 'constant' (suggested) and 'linear'
sample_schedule="arccos", # choose between 'arccos' (suggested) and 'cosine'
)
generated_image.show()
If you want to train or reproduce the results of ARPG, please refer to Getting Started.
If this work is helpful for your research, please give it a star or cite it:
@article{li2025autoregressive,
title={Autoregressive Image Generation with Randomized Parallel Decoding},
author={Haopeng Li and Jinyue Yang and Guoqi Li and Huan Wang},
journal={arXiv preprint arXiv:2503.10568},
year={2025}
}
Thanks to LlamaGen for its open-source codebase. Appreciate RandAR and RAR for inspiring this work, and also thank ControlAR.