Skip to content

[Feature] Allow early training termination at specific step using Trainer.max_steps without modifying LR schedule ie from full convergence run #749

Open
@dorotat-nv

Description

@dorotat-nv

Problem & Motivation

In Evo2, using the --max-steps argument to stop training at a specific step also modifies the learning rate schedule. This makes it difficult to test partial convergence training that stops at a given step without altering the intended LR schedule.
File: sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py

Remove then SignalAfterGivenStepCallback from the training script

BioNeMo Framework Version

7428f5f

Category

Model/Training

Proposed Solution

introduce a new optional argument ie lr_scheduler_steps which, when passed, sets lr rate scheduler number of steps instead of max_steps

Expected Benefits

max_steps can be used to control length of the training when lr_scheduler_steps is used to define schedule of lr

Code Example

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions