Skip to content

Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint #3269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 19, 2025

Conversation

NRamirez01
Copy link
Contributor

@NRamirez01 NRamirez01 commented Mar 17, 2025

Summary

Adding a parameter to fit_mixin.fit method that allows for a fine-tune to be continued from the latest checkpoint if a checkpoint exists in the checkpoint directory

Motivations

A faulty PSU and many hours of fine-tuning going to waste. The method in fit_mixin.fix has a parameter that allow for checkpoints to be saved, and the method that it uses for training, Transformer's trainer.train has a parameter that takes in a str, or a bool, that allows for resuming from a checkpoint, this change takes advantage of that parameter so that the saved checkpoints can now be used.

Results

When a checkpoint file is found, fine-tuning will begin from the latest step found inside of training_state.json, the good thing about this is it allows for continuing a fine-tune without retraining the model on data that it has already been processed.

Changes

Adds an optional parameter resume_from_checkpoint to fit_mixin.fit that prompts a check for checkpoints in checkpoint_directory and continues fine-tuning from there if it finds them

Related Issue

It is related to #1605 and gives a way to use resume_from_checkpoint at least with the fit() method in fit_mixin

@NRamirez01
Copy link
Contributor Author

Reverted a push that was stylistic because it did not play well with delinter.

@tomaarsen
Copy link
Collaborator

I'll take care of the formatting, no worries :)
Normally I would not really update the model.fit() code anymore - it's primarily meant as backwards compatibility as I generally recommend using the SentenceTransformerTrainer now. However, as you mentioned, LlamaIndex and some others still rely on it, so I'm okay with making a change like this.

I'll play around with it myself in a while, before the v4.0 release that's coming soon.

  • Tom Aarsen

@NRamirez01
Copy link
Contributor Author

That is awesome of you. I am embarrassed to say I was currently researching and failing to figure out what formatting changes were needed! lol

With this change hopefully Llama Index will soon be updated to resume_from_checkpoint too.

On a side note: is there something I can check to see the requirements for formatting, or do I need to just run the formatter myself?

Thank you for your time!

@tomaarsen
Copy link
Collaborator

You can also run the formatter yourself indeed! The details are here: https://github.com/UKPLab/sentence-transformers?tab=readme-ov-file#development-setup
In short, pre-commit install will (for this repository only) set up "pre commit hooks" that run the formatting loop whenever you try and commit something. Because these changes are already committed, you can then also run pre-commit run --all to run the formatting on everything (and not just the changes that you're trying to commit).

  • Tom Aarsen

@NRamirez01
Copy link
Contributor Author

Code is now acceptable to ruff and ruff-format. Thank you for pointing that out and next time I will read the README!

Copy link
Collaborator

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very solid! Well done, thanks. I ran it locally and it worked out of the box.

@tomaarsen tomaarsen merged commit 4de714c into UKPLab:master Mar 19, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants