Skip to content

logger message "dataset had no length" confusing when drop_last=True #3693

@lpjiang97

Description

@lpjiang97

System Info

Accelerate v1.8.1
PyTorch v2.6.0+cu124
Datasets v3.6.0

Reproduction

With a data loader whose drop_last is set to True, the following will be triggered in gather_for_metric:

if self.gradient_state.remainder == -1:
    logger.info( "The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.")
    return data

I think the logger info message is potentially confusing if the dataset does have __len__ but drop_last is set to True? In that case, according to DataLoaderStateMixin, the remainder will always be -1 thus always triggering this message at the last batch?

Could this message be updated?

MRE

import logging
from datasets import Dataset
from torch.utils.data import DataLoader
from accelerate import Accelerator
from accelerate.logging import get_logger

logger = get_logger(__name__)

# set level
logging.basicConfig(level=logging.INFO)

dataset = Dataset.from_dict({'data': list(range(18))}).with_format('torch')
print(dataset.__len__())

dataloader = DataLoader(dataset, batch_size=5, drop_last=True)

accelerator = Accelerator(log_with='wandb')
dataloader = accelerator.prepare(dataloader)

for batch in dataloader:
    gathered_items = accelerator.gather_for_metrics(batch)
    print(len(gathered_items))

Expected behavior

No "the used dataset had no length" warning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions