-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
System Info
Accelerate v1.8.1
PyTorch v2.6.0+cu124
Datasets v3.6.0
Reproduction
With a data loader whose drop_last
is set to True, the following will be triggered in gather_for_metric
:
if self.gradient_state.remainder == -1:
logger.info( "The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.")
return data
I think the logger info message is potentially confusing if the dataset does have __len__
but drop_last
is set to True? In that case, according to DataLoaderStateMixin
, the remainder
will always be -1
thus always triggering this message at the last batch?
Could this message be updated?
MRE
import logging
from datasets import Dataset
from torch.utils.data import DataLoader
from accelerate import Accelerator
from accelerate.logging import get_logger
logger = get_logger(__name__)
# set level
logging.basicConfig(level=logging.INFO)
dataset = Dataset.from_dict({'data': list(range(18))}).with_format('torch')
print(dataset.__len__())
dataloader = DataLoader(dataset, batch_size=5, drop_last=True)
accelerator = Accelerator(log_with='wandb')
dataloader = accelerator.prepare(dataloader)
for batch in dataloader:
gathered_items = accelerator.gather_for_metrics(batch)
print(len(gathered_items))
Expected behavior
No "the used dataset had no length" warning.
Metadata
Metadata
Assignees
Labels
No labels