Skip to content

unira-zwj/PhysVLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to PhysVLM

📖PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability


This is the official repository for PhysVLM. The goal of PhysVLM is to enable Vision-Language Models (VLMs) to understand robot reachability, with the future aim of making the action decisions generated by the model more reliable.

Release

  • 2025.03.18 Release the Phys100k-physqa dataset and the Model at 🤗HuggingFace.
  • 2025.03.12 🔥Paper release 📕Arxiv.
  • 2025.03.12 🔥Release the Benchmark: EQA-phys-val-sim.
  • 2025.02.27 🔥PhysVLM has been accepted to CVPR 2025.
  • 🔥Release the code of Phys-VLM.

What can PhysVLM do now?

PhysVLM demonstrates advanced performance across reachability understanding、EmbodiedQA、VQA.

Get Started

1.Clone & Install

git clone [email protected]:unira-zwj/PhysVLM.git
cd PhysVLM/physvlm-main
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

2.Download the PhysVLM models to the checkpoints folder.

Model Links
PhysVLM-3B 🤗HuggingFace

3.Inference

python start_physvlm_server.py

Then you can request the server with (app, host="0.0.0.0", port=8001),

example: run python inference.py for easy inference,

or cd eval & python eval_phys_bench_sim.py for EQA-phys benchmark evaluation.


Acknowledgement

  • LLaVA provides the base codes.
  • qwen provides the basic llm model.

Citation

If you find PhysVLM useful for your research and applications, please cite using this BibTeX:

@misc{zhou2025physvlmenablingvisuallanguage,
      title={PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability}, 
      author={Weijie Zhou and Manli Tao and Chaoyang Zhao and Haiyun Guo and Honghui Dong and Ming Tang and Jinqiao Wang},
      year={2025},
      eprint={2503.08481},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2503.08481}, 
}

About

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages