Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) #3682

SalmanMohammadi · 2025-07-15T17:28:39Z

What does this PR do?

Building on #3651

Dependencies:

Fix TP logic in Transformers
Support ParallelismConfig in HFTrainer

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

cc @S1ro1
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

src/accelerate/utils/dataclasses.py

src/accelerate/accelerator.py

src/accelerate/utils/fsdp_utils.py

src/accelerate/state.py

src/accelerate/accelerator.py

SunMarc · 2025-07-29T16:00:52Z

src/accelerate/accelerator.py

+                clip_context_manager = implicit_replication
+            else:
+                clip_context_manager = contextlib.nullcontext


nice ! We will be able to clean up a bit trainer code after that

SunMarc · 2025-07-29T16:05:09Z

src/accelerate/utils/dataclasses.py

+    def build_device_mesh(self, device_type: str):
+        mesh = self.get_mesh()
+        if not len(list(mesh)):
+            return


yeah we should probably raise an error but tbh we don't really need to deal with this case

SunMarc · 2025-07-29T16:05:58Z

examples/fsdp2/README.md

@@ -2,6 +2,23 @@

 This folder contains examples of using FSDP2 with Accelerate, utilizing extra methods to improve training speed, performance or accuracy.

+### FSDP2 + ND Parallelism
+
+With `AccelerateDistributedConfig`, you can use 🤗 accelerate to train with n-dimensional parallelism. Script `nd_parallel.py` showcases just how you can do it. We enable you to configure 3 different parallel dimensions:


You mean ParallelConfig no ?

Suggested change

With `AccelerateDistributedConfig`, you can use 🤗 accelerate to train with n-dimensional parallelism. Script `nd_parallel.py` showcases just how you can do it. We enable you to configure 3 different parallel dimensions:

With `ParallelConfig`, you can use 🤗 accelerate to train with n-dimensional parallelism. Script `nd_parallel.py` showcases just how you can do it. We enable you to configure 3 different parallel dimensions:

I think we have both because of the duplicate config upstream in transformers - but it would be good to clarify which to use.

It will be better to use the one from accelerate

src/accelerate/utils/dataclasses.py

SunMarc

LGTM ! Just a few nits

SunMarc · 2025-07-30T14:58:39Z

examples/fsdp2/README.md

+<Tip>
+  Only use TP intra-node - therefore max TP size you should need is 8, you can also lower this as FSDP (`--dp-shard-size`) can be faster on smaller models with
+  shorter sequence lengths. If you still cannot fit into memory, utilize `--dp-shard-size` as much as you can. Then to scale up to utilize all your GPUs, fill the rest
+  with `--dp-replicate-size`. This is only a general guideline, you can (and should) experiment with different parallelism configurations to find the best one for your model and hardware. You can learn more about the general strategies for parallelism in our [blog](TODO) or if you wanna dive deep, read the [Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook).


remove TODO

Well the blog isn't ready, so we kind of need to keep the todo there haha (we'll finish before release)

SunMarc · 2025-07-30T15:01:52Z

src/accelerate/accelerator.py

+    def parallelism_config(self) -> ParallelismConfig | None:
+        return self.state.parallelism_config


| syntax only works with py3.10 but we still need to support py3.9

we will drop py3.9 in october btw !

S1ro1 and others added 7 commits June 24, 2025 14:17

Feat: init

2f471e3

Feat: add validation + init from kwargs

43b1ca7

Fix: minor fixes

79faa13

Feat: more cleanup

16f348b

Minor refactor

53ef524

remove import

cd31b02

adding support for pre-configured device mesh

2d89210

SalmanMohammadi commented Jul 16, 2025

View reviewed changes

src/accelerate/utils/dataclasses.py Outdated Show resolved Hide resolved

SalmanMohammadi commented Jul 16, 2025

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

adding device mesh to fsdp2

afaafef

SalmanMohammadi commented Jul 16, 2025

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

SalmanMohammadi added 7 commits July 16, 2025 17:59

moving mesh dim defn to parralismconfig

2d952cb

tests

91ca626

WIP device mesh/accelerator validation

910368b

WIP more tests

b7d154e

Test Driven Development (TDD)

8a0de72

fixing build_device_mesh

1c68efb

FSDP dim names

e01abf1

winglian mentioned this pull request Jul 20, 2025

[prototype] N-D distributed parallel support axolotl-ai-cloud/axolotl#2947

Closed

10 tasks

Salman Mohammadi added 2 commits July 21, 2025 12:13

adding example

69b523c

WIP

c765a44

SalmanMohammadi changed the title ~~[WIP] Parallelism config + BYODM (Bring Your Own Device Mesh)~~ [WIP] Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) Jul 21, 2025

Salman Mohammadi and others added 3 commits July 21, 2025 13:09

fixing HSDP

8d97930

Feat: add back old options

57c0d9e

working example

c93285a

winglian reviewed Jul 21, 2025

View reviewed changes

src/accelerate/utils/fsdp_utils.py Outdated Show resolved Hide resolved

winglian reviewed Jul 21, 2025

View reviewed changes

src/accelerate/state.py Outdated Show resolved Hide resolved

Salman Mohammadi and others added 3 commits July 21, 2025 17:56

debugging

cb40d36

adding parallelism config to partialstate

b76ee67

Feat: revert ddp changes

9aa2612

winglian force-pushed the device_mesh_parallelism_config branch from d9aec5c to a402faf Compare July 26, 2025 13:48

S1ro1 added 8 commits July 26, 2025 18:28

Feat: cleanup example

235d29f

Feat: some cleanup of example

1017752

Merge branch 'main' into device_mesh_parallelism_config

36a1234

Feat: add trackio

7ddb3ab

Fix: improve trackio

9fdc320

Feat: TP works

00dd4af

Feat: some fsdp2 improv

d21ff9f

Feat: working examples

d260842

S1ro1 reviewed Jul 28, 2025

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

winglian and others added 2 commits July 29, 2025 00:27

handle clipping for tensor parallel

8b89d27

Implicit replicate

4709fc8

SunMarc reviewed Jul 29, 2025

View reviewed changes

src/accelerate/utils/dataclasses.py Outdated Show resolved Hide resolved

S1ro1 added 5 commits July 29, 2025 17:44

Refactor: move to separate file + cleanup + basic comments

353b559

Fix: add unadded files, fix circular import

7364440

Feat: better readme

e90f832

Feat: add blog + ultrascale links

044c713

Tmp: should_save_model now returns only true

464a642

SunMarc approved these changes Jul 30, 2025

View reviewed changes

SunMarc mentioned this pull request Jul 30, 2025

cp dataloader #3626

Closed

S1ro1 and others added 8 commits July 30, 2025 15:12

Fix: remove implicit_replication and style

f85eadf

Fix: remove optional

86771e2

add guard on parallelism_config.tp_enabled

c80aae0

fix import

c8a2ae5

fixing empty parallelism_config

ec59f84

fix import path for test patch

0afb69f

fixing patch

89aad7a

merging

c570f7c

S1ro1 merged commit 9359a01 into huggingface:main Jul 30, 2025
25 checks passed

	With `AccelerateDistributedConfig`, you can use 🤗 accelerate to train with n-dimensional parallelism. Script `nd_parallel.py` showcases just how you can do it. We enable you to configure 3 different parallel dimensions:
	With `ParallelConfig`, you can use 🤗 accelerate to train with n-dimensional parallelism. Script `nd_parallel.py` showcases just how you can do it. We enable you to configure 3 different parallel dimensions:

		def parallelism_config(self) -> ParallelismConfig \| None:
		return self.state.parallelism_config

Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) #3682

Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) #3682

Uh oh!

Conversation

SalmanMohammadi commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SalmanMohammadi commented Jul 15, 2025 •

edited

Loading