Skip to content

cp dataloader #3626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

cp dataloader #3626

wants to merge 3 commits into from

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Jun 12, 2025

What does this PR do?

To try CP support for dataloader. Make sure to set dispatch_batches to False and split_batches to False in accelerate config

cc @qgallouedec

@qgallouedec qgallouedec marked this pull request as draft June 12, 2025 17:23
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jul 21, 2025
@SunMarc SunMarc reopened this Jul 22, 2025
Comment on lines 1138 to +1144
process_index = process_index // submesh_tp_size
num_processes = submesh_fsdp_size * submesh_dp_size

if cp:
process_index = 0
num_processes = 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does only 1 process break n-d parallel? Maybe something like?

Suggested change
process_index = process_index // submesh_tp_size
num_processes = submesh_fsdp_size * submesh_dp_size
if cp:
process_index = 0
num_processes = 1
process_index = process_index // (submesh_tp_size * submesh_cp_size)
num_processes = submesh_fsdp_size * submesh_dp_size // (submesh_tp_size * submesh_cp_size)
if cp:
process_index = 0
num_processes = 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed we will have something like that. I just opened this PR to not forget about this but we will upstream the changes to main in another pr when n-d parallelism pr will be finished.

@github-actions github-actions bot closed this Jul 30, 2025
@SunMarc
Copy link
Member Author

SunMarc commented Jul 30, 2025

this should cover this PR #3682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants