📖PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
This is the official repository for PhysVLM. The goal of PhysVLM is to enable Vision-Language Models (VLMs) to understand robot reachability, with the future aim of making the action decisions generated by the model more reliable.
-
2025.03.18
Release the Phys100k-physqa dataset and the Model at🤗HuggingFace
. -
2025.03.12
🔥Paper release📕Arxiv
. -
2025.03.12
🔥Release the Benchmark: EQA-phys-val-sim. -
2025.02.27
🔥PhysVLM has been accepted to CVPR 2025. - 🔥Release the code of Phys-VLM.
PhysVLM demonstrates advanced performance across reachability understanding、EmbodiedQA、VQA.
git clone [email protected]:unira-zwj/PhysVLM.git
cd PhysVLM/physvlm-main
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
Model | Links |
---|---|
PhysVLM-3B | 🤗HuggingFace |
python start_physvlm_server.py
Then you can request the server with (app, host="0.0.0.0", port=8001)
,
example: run python inference.py
for easy inference,
or cd eval & python eval_phys_bench_sim.py
for EQA-phys benchmark evaluation.
If you find PhysVLM useful for your research and applications, please cite using this BibTeX:
@misc{zhou2025physvlmenablingvisuallanguage,
title={PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability},
author={Weijie Zhou and Manli Tao and Chaoyang Zhao and Haiyun Guo and Honghui Dong and Ming Tang and Jinqiao Wang},
year={2025},
eprint={2503.08481},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2503.08481},
}