Fix progress bar display to correctly handle iterable dataset and max_steps during training #20869

bandpooja · 2025-06-01T05:17:10Z

What does this PR do?

This PR fixes the progress bar display in PyTorch Lightning to correctly handle the case when max_steps is set and max_epochs is -1 (infinite epochs mode). Previously, the progress bar did not accurately reflect the total number of batches to process when training was limited by max_steps, causing confusing or incomplete progress updates.

Fixes #20862 and #20124

Does this PR introduce any breaking changes?

No breaking changes introduced. This is a UI/progress bar improvement only.

Additional notes

LOCAL TESTS

max_steps > total training batches

import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F
import pytorch_lightning as pl
from torch import nn
from pytorch_lightning import Trainer

# Dummy Dataset
class DummyDataset(Dataset):
    def __len__(self):
        return 10000  # Large enough to allow many steps

    def __getitem__(self, idx):
        x = torch.randn(10)
        y = torch.randint(0, 2, (1,))
        return x, y[0]

# Simple Model
class DummyModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 2)

    def forward(self, x):
        return self.linear(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.cross_entropy(logits, y)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3)

dataset = DummyDataset()
loader = DataLoader(dataset, batch_size=32)

model = DummyModel()

trainer = Trainer(
    max_steps=500,              # 👈 This tests the changes!
    accelerator="cpu",
    log_every_n_steps=1,
    enable_model_summary=False,
)

trainer.fit(model, train_dataloaders=loader)

max_steps < total training batches

trainer = Trainer(
    max_steps=100,              # 👈 This tests the changes!
    accelerator="cpu",
    log_every_n_steps=1,
    enable_model_summary=False,
)

trainer.fit(model, train_dataloaders=loader)

Training with iterable dataset as done in Why does the progress bar not show the total steps when using iterable dataset? #20124

from torch.utils.data import IterableDataset

class InfiniteIterableDataset(IterableDataset):
    def __iter__(self):
        while True:
            x = torch.randn(10)
            y = torch.randint(0, 2, (1,))
            yield x, y[0] # infinite stream

dataset = InfiniteIterableDataset()
loader = DataLoader(dataset, batch_size=32)

model = DummyModel()

trainer = Trainer(
    max_steps=500,              # 👈 This tests the changes!
    accelerator="cpu",
    log_every_n_steps=1,
    enable_model_summary=False,
)

trainer.fit(model, train_dataloaders=loader)

for more information, see https://pre-commit.ci

Borda

could we have a test that this return the correct number, pls

for more information, see https://pre-commit.ci

bandpooja · 2025-06-03T03:16:40Z

@Borda, I'm still getting up to speed with writing effective tests — happy to hear any feedback!

Borda · 2025-06-03T14:26:03Z

tests/tests_pytorch/loops/test_training_loop.py

+    # tqdm total steps should equal max_steps for iterator with no length
+    assert trainer.estimated_stepping_batches == max_steps


let's have assert on progress_bar property: total_train_batches

Borda · 2025-06-03T14:26:39Z

I'm still getting up to speed with writing effective tests — happy to hear any feedback!

overall the test looks good, let's just have a direct assert on the property

changes to show correct progress bar numbers when using max_steps

087c752

bandpooja requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners June 1, 2025 05:17

github-actions bot added the pl Generic label for PyTorch Lightning package label Jun 1, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

b1d45bb

for more information, see https://pre-commit.ci

Borda approved these changes Jun 2, 2025

View reviewed changes

Borda added the waiting on author Waiting on user action, correction, or update label Jun 2, 2025

bandpooja and others added 2 commits June 2, 2025 23:13

adding pytest for correst number of training steps in progress bar

af41d33

[pre-commit.ci] auto fixes from pre-commit.com hooks

49f7f23

for more information, see https://pre-commit.ci

Borda reviewed Jun 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix progress bar display to correctly handle iterable dataset and max_steps during training #20869

Fix progress bar display to correctly handle iterable dataset and max_steps during training #20869

Uh oh!

bandpooja commented Jun 1, 2025 •

edited

Loading

Uh oh!

Borda left a comment

Uh oh!

bandpooja commented Jun 3, 2025 •

edited

Loading

Uh oh!

Borda Jun 3, 2025

Uh oh!

Borda commented Jun 3, 2025

Uh oh!

Uh oh!

		# tqdm total steps should equal max_steps for iterator with no length
		assert trainer.estimated_stepping_batches == max_steps

Fix progress bar display to correctly handle iterable dataset and max_steps during training #20869

Are you sure you want to change the base?

Fix progress bar display to correctly handle iterable dataset and max_steps during training #20869

Uh oh!

Conversation

bandpooja commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Does this PR introduce any breaking changes?

Additional notes

LOCAL TESTS

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

bandpooja commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Borda Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Borda commented Jun 3, 2025

Uh oh!

Uh oh!

bandpooja commented Jun 1, 2025 •

edited

Loading

bandpooja commented Jun 3, 2025 •

edited

Loading