Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint #3269

NRamirez01 · 2025-03-17T02:45:28Z

Summary

Adding a parameter to fit_mixin.fit method that allows for a fine-tune to be continued from the latest checkpoint if a checkpoint exists in the checkpoint directory

Motivations

A faulty PSU and many hours of fine-tuning going to waste. The method in fit_mixin.fix has a parameter that allow for checkpoints to be saved, and the method that it uses for training, Transformer's trainer.train has a parameter that takes in a str, or a bool, that allows for resuming from a checkpoint, this change takes advantage of that parameter so that the saved checkpoints can now be used.

Results

When a checkpoint file is found, fine-tuning will begin from the latest step found inside of training_state.json, the good thing about this is it allows for continuing a fine-tune without retraining the model on data that it has already been processed.

Changes

Adds an optional parameter resume_from_checkpoint to fit_mixin.fit that prompts a check for checkpoints in checkpoint_directory and continues fine-tuning from there if it finds them

Related Issue

It is related to #1605 and gives a way to use resume_from_checkpoint at least with the fit() method in fit_mixin

…kpoint parameter

NRamirez01 · 2025-03-17T23:53:58Z

Reverted a push that was stylistic because it did not play well with delinter.

tomaarsen · 2025-03-18T14:58:40Z

I'll take care of the formatting, no worries :)
Normally I would not really update the model.fit() code anymore - it's primarily meant as backwards compatibility as I generally recommend using the SentenceTransformerTrainer now. However, as you mentioned, LlamaIndex and some others still rely on it, so I'm okay with making a change like this.

I'll play around with it myself in a while, before the v4.0 release that's coming soon.

Tom Aarsen

NRamirez01 · 2025-03-18T15:09:26Z

That is awesome of you. I am embarrassed to say I was currently researching and failing to figure out what formatting changes were needed! lol

With this change hopefully Llama Index will soon be updated to resume_from_checkpoint too.

On a side note: is there something I can check to see the requirements for formatting, or do I need to just run the formatter myself?

Thank you for your time!

tomaarsen · 2025-03-18T20:08:27Z

You can also run the formatter yourself indeed! The details are here: https://github.com/UKPLab/sentence-transformers?tab=readme-ov-file#development-setup
In short, pre-commit install will (for this repository only) set up "pre commit hooks" that run the formatting loop whenever you try and commit something. Because these changes are already committed, you can then also run pre-commit run --all to run the formatting on everything (and not just the changes that you're trying to commit).

Tom Aarsen

NRamirez01 · 2025-03-19T00:40:24Z

Code is now acceptable to ruff and ruff-format. Thank you for pointing that out and next time I will read the README!

sentence_transformers/fit_mixin.py

tomaarsen

Looks very solid! Well done, thanks. I ran it locally and it worked out of the box.

NRamirez01 added 2 commits March 16, 2025 21:15

Update to fit_mixin.fit to take advantage of trainer resume_from_chec…

4b3006e

…kpoint parameter

Update to improve resume_from_checkpoint validations

c55dbd7

NRamirez01 force-pushed the master branch from 1edfe70 to c55dbd7 Compare March 17, 2025 13:19

updated formatting for ruff and ruff-format

c3dc4db

tomaarsen reviewed Mar 19, 2025

View reviewed changes

sentence_transformers/fit_mixin.py Outdated Show resolved Hide resolved

Indent docstring correctly

8d38e5c

tomaarsen approved these changes Mar 19, 2025

View reviewed changes

tomaarsen merged commit 4de714c into UKPLab:master Mar 19, 2025
9 checks passed

NRamirez01 mentioned this pull request Mar 20, 2025

Update to SentenceTransformersFinetuneEngine to expose transformer checkpoint related arguments run-llama/llama_index#18194

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint #3269

Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint #3269

Uh oh!

NRamirez01 commented Mar 17, 2025 •

edited

Loading

Uh oh!

NRamirez01 commented Mar 17, 2025

Uh oh!

tomaarsen commented Mar 18, 2025

Uh oh!

NRamirez01 commented Mar 18, 2025

Uh oh!

tomaarsen commented Mar 18, 2025

Uh oh!

NRamirez01 commented Mar 19, 2025

Uh oh!

Uh oh!

tomaarsen left a comment

Uh oh!

Uh oh!

Uh oh!

Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint #3269

Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint #3269

Uh oh!

Conversation

NRamirez01 commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivations

Results

Changes

Related Issue

Uh oh!

NRamirez01 commented Mar 17, 2025

Uh oh!

tomaarsen commented Mar 18, 2025

Uh oh!

NRamirez01 commented Mar 18, 2025

Uh oh!

tomaarsen commented Mar 18, 2025

Uh oh!

NRamirez01 commented Mar 19, 2025

Uh oh!

Uh oh!

tomaarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NRamirez01 commented Mar 17, 2025 •

edited

Loading