Skip to content

pi0-fintune-performance #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yanghb1 opened this issue Apr 9, 2025 · 6 comments
Open

pi0-fintune-performance #427

yanghb1 opened this issue Apr 9, 2025 · 6 comments
Assignees

Comments

@yanghb1
Copy link

yanghb1 commented Apr 9, 2025

I have been fine-tuning the provided pi0-base model on my dataset using LeRobot. After training for 100,000 steps, I found that the model performs well on tasks that appeared in my dataset, but its performance on unseen tasks is very poor. It seems to lack the generalization ability of a VLA model. Is this phenomenon normal? Are there any strategies to improve this situation?

@uzhilinsky uzhilinsky self-assigned this Apr 10, 2025
@uzhilinsky
Copy link
Collaborator

We've certainly been able to successfully fine-tune pi0-base on novel tasks. Hard to say what's going on in your case without having more context.

Just in case, are you using the right norm stats?

@yanghb1
Copy link
Author

yanghb1 commented Apr 10, 2025

Thank you for your patient response. May the sunshine bathe you. I am using pi0-base for training and fine-tuning on the lerobot project. The base model is from https://huggingface.co/lerobot/pi0. My training dataset contains 700 collected trajectories with only one task: opening a cabinet door. After training for 100,000 steps, the loss has converged to 0.03. During inference, I found that the model can perform the task of opening the cabinet door as in the dataset, but it performs poorly on other instructions, such as grabbing a ballpoint pen or moving the yellow tape to the left of the blue tape, and still tends to open the cabinet door. Is this phenomenon normal? Are there any other ways to improve this? I read in the paper that I thought fine-tuning the expert model and state projector with local data was sufficient for executing various instructions on a local robotic arm.

@uzhilinsky
Copy link
Collaborator

100,000 steps is a lot for such a small dataset. It's very likely that the model has overfit to your data.

Note that we are using 20K or 30K in our fine-tuning examples with much larger datasets. Have you tried training with fewer steps?

@yanghb1
Copy link
Author

yanghb1 commented Apr 10, 2025

thank you ! We tested models with 20k-30k steps, and the loss finally converged to 0.03. The performance was relatively poor, and the model could only execute tasks from the fine-tuning dataset. Under normal circumstances, when fine-tuning π₀ using a small single-task dataset, can the model, after training, not only perform tasks from the local dataset but also execute tasks from your pre-training stage? Does the model have such generalization capability?

@oxFFFF-Q
Copy link

Same issue here. Did you solve it?

@YanJiaHuan
Copy link

anyone tried they said?
With larger dataset and smaller training steps.
I tried 200 demos (10-15s each,10hz) with over 300k steps of training, the model performs terrible, simple pick&place task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants