TL;DR ORV generates robot videos with the geometry guidance of 4D occupancy, achieves higher control precision, shows strong generalizations, performs multi-view consistent videos generation and conducts simulation-to-real data transfer.
ORV: 4D Occupancy-centric Robot Video Generation
Xiuyu Yang*, Bohan Li*, Shaocong Xu, Nan Wang, Chongjie Ye, Zhaoxi Chen, Minghan Qin, Yikang Ding, Xin Jin, Hang Zhao, Hao Zhao
Preprint (arXiv 2506.03079)
If you find our work useful in your research, please consider citing our paper:
@article{yang2025orv,
title={ORV: 4D Occupancy-centric Robot Video Generation},
author={Yang, Xiuyu and Li, Bohan and Xu, Shaocong and Wang, Nan and Ye, Chongjie and Chen Zhaoxi and Qin, Minghan and Ding Yikang and Jin, Xin and Zhao, Hang and Zhao, Hao},
journal={arXiv preprint arXiv:2506.03079},
year={2025}
}
- Release arXiv technique report
- Release full codes
- Release instructions for data processing
- Release processed data
Thansk for these excellent opensource works and models: CogVideoX; DiffusionAsShader; diffusers;.