-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) #3682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
S1ro1
merged 76 commits into
huggingface:main
from
SalmanMohammadi:device_mesh_parallelism_config
Jul 30, 2025
+1,236
−121
Merged
Changes from 7 commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
2f471e3
Feat: init
S1ro1 43b1ca7
Feat: add validation + init from kwargs
S1ro1 79faa13
Fix: minor fixes
S1ro1 16f348b
Feat: more cleanup
S1ro1 53ef524
Minor refactor
S1ro1 cd31b02
remove import
S1ro1 2d89210
adding support for pre-configured device mesh
SalmanMohammadi afaafef
adding device mesh to fsdp2
SalmanMohammadi 2d952cb
moving mesh dim defn to parralismconfig
SalmanMohammadi 91ca626
tests
SalmanMohammadi 910368b
WIP device mesh/accelerator validation
SalmanMohammadi b7d154e
WIP more tests
SalmanMohammadi 8a0de72
Test Driven Development (TDD)
SalmanMohammadi 1c68efb
fixing build_device_mesh
SalmanMohammadi e01abf1
FSDP dim names
SalmanMohammadi 69b523c
adding example
c765a44
WIP
8d97930
fixing HSDP
57c0d9e
Feat: add back old options
S1ro1 c93285a
working example
cb40d36
debugging
b76ee67
adding parallelism config to partialstate
9aa2612
Feat: revert ddp changes
S1ro1 de96e74
Revert DDP
S1ro1 fd05e3b
Feat: (untested) update mesh dims and some minor tweaks
S1ro1 efc903e
adding dp_cp dims
7c3d0e3
updating comments
3cfce25
WIP
1bbdb75
wip 2
aa749ad
reverting
aa74576
storing state in accelerator rather than acceleratorstate
4e99b9c
Fix: minor tweaks
S1ro1 3d235cb
wip example update
61868c2
merging
f96fea3
Fixes for non-fsdp2 case
S1ro1 dd89452
Feat: ensure ddp/tp only works
S1ro1 7f243e0
updating example
4a2dd58
updating example
dc145c2
updating examples, fixing state
f21547f
fixed state
1a49c16
comments
07bf2b3
fixing partial state check
f274b35
linting
a6feca9
comments
80deb7e
removing fn
52c178f
merging
133ef5f
WIP: fix tp
S1ro1 74009ea
comments
379daa0
removing return
168b520
reverting upcast
76a546f
add guards
winglian e8963dc
guards for empty self.parallelism_config
winglian a402faf
use len on tuple to check if empty
winglian 235d29f
Feat: cleanup example
S1ro1 1017752
Feat: some cleanup of example
S1ro1 36a1234
Merge branch 'main' into device_mesh_parallelism_config
S1ro1 7ddb3ab
Feat: add trackio
S1ro1 9fdc320
Fix: improve trackio
S1ro1 00dd4af
Feat: TP works
S1ro1 d21ff9f
Feat: some fsdp2 improv
S1ro1 d260842
Feat: working examples
S1ro1 8b89d27
handle clipping for tensor parallel
winglian 4709fc8
Implicit replicate
S1ro1 353b559
Refactor: move to separate file + cleanup + basic comments
S1ro1 7364440
Fix: add unadded files, fix circular import
S1ro1 e90f832
Feat: better readme
S1ro1 044c713
Feat: add blog + ultrascale links
S1ro1 464a642
Tmp: should_save_model now returns only true
S1ro1 f85eadf
Fix: remove implicit_replication and style
S1ro1 86771e2
Fix: remove optional
S1ro1 c80aae0
add guard on parallelism_config.tp_enabled
winglian c8a2ae5
fix import
winglian ec59f84
fixing empty parallelism_config
0afb69f
fix import path for test patch
winglian 89aad7a
fixing patch
c570f7c
merging
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.