Skip to content

[Feature] Support IterableDataset on distributed environment #1151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

HydrogenSulfate
Copy link
Collaborator

@HydrogenSulfate HydrogenSulfate commented May 12, 2025

PR types

New features

PR changes

Others

Describe

  1. 支持 io.IterableDataset 类的数据集在分布式环境下训练,每个进程生成同一份原始数据(记其大小为batch_size),然后根据当前进程的rank和world_size信息,得到进程内的切片数据(大小为batch_size/world_size)(要求batch_size能被world_size整除)。

  2. 简化使用文档中,数据并行的运行命令

  3. AllenCahn精度验证
    单卡精度:1.2e-5
    双卡精度:1.5e-5
    image

    3.1 双卡数据分布
    image

    3.2 每张卡上,切分前的全量数据集相同
    image

    3.3 每张卡上,切分后的数据集不同,并且无交集
    image

Copy link

paddle-bot bot commented May 12, 2025

Thanks for your contribution!

@HydrogenSulfate HydrogenSulfate changed the title [fea] Support IterableDataset on distributed environment [Feature] Support IterableDataset on distributed environment May 12, 2025
@HydrogenSulfate HydrogenSulfate force-pushed the support_dist_iterdataset branch from 71560ef to 2151a8a Compare May 12, 2025 09:44
@HydrogenSulfate HydrogenSulfate force-pushed the support_dist_iterdataset branch from 1728383 to d15265d Compare May 12, 2025 11:32
@zhiminzhang0830 zhiminzhang0830 merged commit ca6ef98 into PaddlePaddle:develop May 13, 2025
3 of 4 checks passed
@HydrogenSulfate HydrogenSulfate deleted the support_dist_iterdataset branch May 13, 2025 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants