Skip to content

Add CallbackGroup & Metadata factory function #13437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

PytLab
Copy link
Collaborator

@PytLab PytLab commented May 5, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add global callback group & metadata factory function for NeMo

Collection: [Note which collection this PR will affect]
Will keep updating

  • asr
  • multimodal
  • nlp

Changelog

  • New CallbackGroup and Callback ABC in NeMo/nemo/lightning/pytorch/callbacks/callback_group.py

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@PytLab PytLab self-assigned this May 5, 2025
@github-actions github-actions bot added core Changes to NeMo Core ASR NLP Multi Modal labels May 6, 2025
@PytLab PytLab requested a review from dimapihtar May 7, 2025 08:25
@github-actions github-actions bot removed the ASR label May 12, 2025
@@ -47,6 +47,7 @@
from nemo.core.classes.common import Model
from nemo.core.connectors.save_restore_connector import SaveRestoreConnector
from nemo.core.optim import McoreDistributedOptimizer, prepare_lr_scheduler
from nemo.lightning.pytorch.callbacks.callback_group import CallbackGroup, wrap_methods_with_callbacks

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'CallbackGroup' is not used.

Copilot Autofix

AI about 4 hours ago

To fix the issue, the unused import of CallbackGroup should be removed from the file. This will eliminate unnecessary clutter and improve code readability. The removal should be limited to the specific import statement flagged by CodeQL, ensuring no unintended changes to other parts of the code.

Suggested changeset 1
nemo/core/classes/modelPT.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/core/classes/modelPT.py b/nemo/core/classes/modelPT.py
--- a/nemo/core/classes/modelPT.py
+++ b/nemo/core/classes/modelPT.py
@@ -50,3 +50,3 @@
 from nemo.core.optim import McoreDistributedOptimizer, prepare_lr_scheduler
-from nemo.lightning.pytorch.callbacks.callback_group import CallbackGroup, wrap_methods_with_callbacks
+from nemo.lightning.pytorch.callbacks.callback_group import wrap_methods_with_callbacks
 from nemo.utils import logging, model_utils
EOF
@@ -50,3 +50,3 @@
from nemo.core.optim import McoreDistributedOptimizer, prepare_lr_scheduler
from nemo.lightning.pytorch.callbacks.callback_group import CallbackGroup, wrap_methods_with_callbacks
from nemo.lightning.pytorch.callbacks.callback_group import wrap_methods_with_callbacks
from nemo.utils import logging, model_utils
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
trainer: The PyTorch Lightning trainer
pl_module: The PyTorch Lightning module
"""
self.one_logger.on_validation_start(val_iterations=trainer.global_step)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesnt seem to be compatible with on_validation_start of OneLoggerUtils

"""
self.one_logger.on_validation_start(val_iterations=trainer.global_step)

def on_validation_end(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:
Copy link
Contributor

@dorotat-nv dorotat-nv Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fails if the batch specific hooks for validation are not provided

    def on_validation_batch_start(self, trainer, pl_module, batch, batch_idx, dataloader_idx=0) -> None:
        """Called when validation batch begins.

        Args:
            trainer: The PyTorch Lightning trainer
            pl_module: The PyTorch Lightning module
        """
        self.one_logger.on_validation_batch_start()

    def on_validation_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx=0) -> None:
        """Called when validation batch ends.

        Args:
            trainer: The PyTorch Lightning trainer
            pl_module: The PyTorch Lightning module
        """
        self.one_logger.on_validation_batch_end()

"""

import functools
import logging

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'logging' is not used.

Copilot Autofix

AI about 3 hours ago

To fix the issue, the unused logging import on line 8 should be removed. This will eliminate the unnecessary dependency and make the code cleaner. No other changes are required, as the removal of this import does not affect the functionality of the code.

Suggested changeset 1
nemo/lightning/one_logger_callback.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/lightning/one_logger_callback.py b/nemo/lightning/one_logger_callback.py
--- a/nemo/lightning/one_logger_callback.py
+++ b/nemo/lightning/one_logger_callback.py
@@ -7,3 +7,3 @@
 import functools
-import logging
+
 from typing import Any, Dict, List, Optional, Type
EOF
@@ -7,3 +7,3 @@
import functools
import logging

from typing import Any, Dict, List, Optional, Type
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
import pytorch_lightning as pl
import torch
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.plugins.io import AsyncCheckpointIO

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'AsyncCheckpointIO' is not used.

Copilot Autofix

AI about 3 hours ago

To fix the problem, we will remove the unused import statement from pytorch_lightning.plugins.io import AsyncCheckpointIO on line 17. This will clean up the code and eliminate the unnecessary dependency without affecting the functionality of the program.


Suggested changeset 1
nemo/lightning/one_logger_callback.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/lightning/one_logger_callback.py b/nemo/lightning/one_logger_callback.py
--- a/nemo/lightning/one_logger_callback.py
+++ b/nemo/lightning/one_logger_callback.py
@@ -16,3 +16,3 @@
 from pytorch_lightning.core import LightningModule
-from pytorch_lightning.plugins.io import AsyncCheckpointIO
+
 from pytorch_lightning.utilities import rank_zero_only
EOF
@@ -16,3 +16,3 @@
from pytorch_lightning.core import LightningModule
from pytorch_lightning.plugins.io import AsyncCheckpointIO

from pytorch_lightning.utilities import rank_zero_only
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
import logging
from typing import Any, Dict, List, Optional, Type

import nv_one_logger.training_telemetry.api.callbacks as CB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric: it is better to publish to public index, if we have difficulties, make sure this won't fail with import guard, and maybe ref to the open sourced one logger repo.

trainer: Optional trainer instance for checkpoint configuration
"""
try:
from nv_one_logger.exporter.wandb_exporter import WandbExporter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saju: we need to change this import to be the config based exporter import. so we could change the exporter destination by simple onelogger package upgrade.

from typing import Any, Dict, List, Optional, Type

import nv_one_logger.training_telemetry.api.callbacks as CB
import pytorch_lightning as pl

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'pl' is not used.

Copilot Autofix

AI about 3 hours ago

To fix the issue, the unused import statement import pytorch_lightning as pl should be removed from the file. This will eliminate the unnecessary dependency and improve code clarity. No other changes are required, as the rest of the code already uses specific imports from PyTorch Lightning directly.


Suggested changeset 1
nemo/lightning/one_logger_callback.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/lightning/one_logger_callback.py b/nemo/lightning/one_logger_callback.py
--- a/nemo/lightning/one_logger_callback.py
+++ b/nemo/lightning/one_logger_callback.py
@@ -11,3 +11,3 @@
 import nv_one_logger.training_telemetry.api.callbacks as CB
-import pytorch_lightning as pl
+
 import torch
EOF
@@ -11,3 +11,3 @@
import nv_one_logger.training_telemetry.api.callbacks as CB
import pytorch_lightning as pl

import torch
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated

import nv_one_logger.training_telemetry.api.callbacks as CB
import pytorch_lightning as pl
import torch

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'torch' is not used.

Copilot Autofix

AI about 3 hours ago

To fix the problem, the unused torch import on line 13 should be removed. This will eliminate the unnecessary dependency and make the code cleaner. No other changes are required since the removal of this import does not affect the functionality of the code.

Suggested changeset 1
nemo/lightning/one_logger_callback.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/lightning/one_logger_callback.py b/nemo/lightning/one_logger_callback.py
--- a/nemo/lightning/one_logger_callback.py
+++ b/nemo/lightning/one_logger_callback.py
@@ -12,3 +12,3 @@
 import pytorch_lightning as pl
-import torch
+
 from pytorch_lightning import Trainer
EOF
@@ -12,3 +12,3 @@
import pytorch_lightning as pl
import torch

from pytorch_lightning import Trainer
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to NeMo Core NLP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants