-
Notifications
You must be signed in to change notification settings - Fork 29.6k
when delaying optimizer creation only prepare the model #39152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @SunMarc |
@@ -2357,7 +2357,7 @@ def _inner_training_loop( | |||
model = self.accelerator.prepare(self.model) | |||
else: | |||
if delay_optimizer_creation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @IlyasMoutawwakil as you wanted to remove this 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this fixes it too ! I honestly don't understand delay_optimizer_creation
, like delay until when and why ? 😅 might make sense to explain it somewhere in the trainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you see why I removed it, is because currently we do create the optimizer here, and we need to prepare the fsdp model as well (otherwise fsdp fails), so the two branches of the if statement become the same
cc @kmehant, if you can explain the change you tried to do, that would be helpful ! |
Hi @SunMarc thanks for looping me in! Appreciate it. Ideally this block of code transformers/src/transformers/trainer.py Lines 2344 to 2350 in b31e9d1
transformers/src/transformers/trainer.py Lines 2353 to 2362 in b31e9d1
to
here - transformers/src/transformers/trainer.py Line 2349 in b31e9d1
OR We can also go back to the older code too, since TPlizing the model is removed from accelerate prepare step which works as well.
for FSDP case. I can help in contribution if needed. Nonetheless I +1 to @IlyasMoutawwakil to remove this all together since its always been a confusing parameter to me :) |
@SunMarc @IlyasMoutawwakil @ArthurZucker This is much more correct fix for this bug - PR: #39177. The current PR breaks TP trainings (since prepare is not needed for TP and enforcing prepare would lead to DDP setting which fails). The aforementioned PR fixes for both FSDP and TP. |
What does this PR do?
Axolotl's CI caught a regression when we tried to upgrade to latest transformers. https://github.com/axolotl-ai-cloud/axolotl/actions/runs/15962262932/job/45016550543
PR #36132 introduced a regression breaking FSDP w llama
and FSDP+DPO+qwen
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.