Continuous Visual Autoregressive Generation via Score Maximization

Code for ICML 2025 paper Continuous Visual Autoregressive Generation via Score Maximization.

🔥 Highlights

💡 A principled framework for continuous VAR, theoretically grounded in strictly proper scoring rules.
🚀 Likelihood-free learning with energy Transformer, supported by energy score.
💪 Expressive and efficient, overcoming key limitations of GIVT and diffusion loss.
🎉 Competitive performance in both generation quality and inference efficiency.

Preparation

Installation

A suitable conda environment named ear can be created and activated with:

conda env create -f environment.yaml
conda activate ear

Dataset

Download ImageNet dataset, and place it in your IMAGENET_PATH.

VAE

Download the continuous image tokenizer pre-trained by MAR:

python util/download.py

Caching VAE Latents

Caching the VAE latents to CACHED_PATH to save computations during training:

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
main_cache.py \
--img_size 256 --vae_path pretrained_models/vae/kl16.ckpt --vae_embed_dim 16 \
--batch_size 128 \
--data_path ${IMAGENET_PATH} --cached_path ${CACHED_PATH}

Training

Script for training EAR-B on 32 GPUs (750 epochs standard training + 50 epochs temperture fine-tuning). Adjust --accumulation_steps to train with a different number of GPUs.

torchrun --nproc_per_node=8 --nnodes=4 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
main_ear.py \
--img_size 256 --vae_path pretrained_models/vae/kl16.ckpt --vae_embed_dim 16 --vae_stride 16 --patch_size 1 \
--score_lrscale 0.25 --train_temperature 1.0 --alpha 1.0 \
--model ear_base --scoreloss_d 6 --scoreloss_w 1024 --noise_channels 64 \
--epochs 750 --warmup_epochs 100 --batch_size 32 --blr 1e-4 --score_batch_mul 2 \
--cfg 3.0 --cfg_schedule linear --accumulation_steps 2 \
--output_dir ${OUTPUT_DIR} --resume ${OUTPUT_DIR} --online_eval --eval_freq 50 \
--use_cached --cached_path ${CACHED_PATH} --data_path ${IMAGENET_PATH}


torchrun --nproc_per_node=8 --nnodes=4 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
main_ear.py \
--img_size 256 --vae_path pretrained_models/vae/kl16.ckpt --vae_embed_dim 16 --vae_stride 16 --patch_size 1 \
--score_lrscale 0.25 --train_temperature 0.99 --infer_temperature 0.7 --alpha 1.0 \
--model ear_base --scoreloss_d 6 --scoreloss_w 1024 --noise_channels 64 \
--epochs 800 --warmup_epochs 100 --batch_size 32 --blr 7e-5 --score_batch_mul 2 \
--cfg 3.0 --cfg_schedule linear --accumulation_steps 2 \
--output_dir ${OUTPUT_DIR} --resume ${OUTPUT_DIR} --online_eval --eval_freq 5 \
--use_cached --cached_path ${CACHED_PATH} --data_path ${IMAGENET_PATH}

To train EAR-L, set --model ear_large and adjust the size of MLP generator to --scoreloss_d 8 --scoreloss_w 1280. To train EAR-H, set --model ear_huge and adjust the size of MLP generator to --scoreloss_d 12 --scoreloss_w 1536.

Evaluation

Evaluate EAR-B with classifier-free guidance:

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
main_ear.py \
--model ear_base --scoreloss_d 6 --scoreloss_w 1024 \
--eval_bsz 128 --num_images 50000 \
--num_iter 64 --cfg 3.0 --cfg_schedule linear --infer_temperature 0.7 \
--output_dir ${OUTPUT_DIR} \
--resume ${OUTPUT_DIR} \
--data_path ${IMAGENET_PATH} --evaluate

Acknowledgements

Our code is based on MAR. Thanks for their great work.

Citation

If you find the resources in this repository useful, please cite as:

@inproceedings{shao2025ear,
  author = {Shao, Chenze and Meng, Fandong and Zhou, Jie},
  title = {Continuous Visual Autoregressive Generation via Score Maximization},
  booktitle = {Proceedings of the 42th International Conference on Machine Learning, {ICML} 2025},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
fid_stats		fid_stats
models		models
util		util
LICENSE		LICENSE
README.md		README.md
engine_ear.py		engine_ear.py
environment.yaml		environment.yaml
main_cache.py		main_cache.py
main_ear.py		main_ear.py
model.pdf		model.pdf
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Continuous Visual Autoregressive Generation via Score Maximization

🔥 Highlights

Preparation

Installation

Dataset

VAE

Caching VAE Latents

Training

Evaluation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

License

shaochenze/EAR

Folders and files

Latest commit

History

Repository files navigation

Continuous Visual Autoregressive Generation via Score Maximization

🔥 Highlights

Preparation

Installation

Dataset

VAE

Caching VAE Latents

Training

Evaluation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages