Skip to content

Fix AutoTP gathering replaced layer params when bias is not None #7257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Apr 28, 2025

Some params are one-dimensional, this PR adds support for these params.

Resolve #7249

param.shape torch.Size([768, 1536])
param.shape torch.Size([768])
...
with deepspeed.module_inject.layers.GatherReplacedLayerParams([param], model, enabled=True):
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "deepspeed/module_inject/layers.py", line 359, in __enter__
self.params[0].gather_params(self.params)
File "torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
       ^^^^^^^^^^^^^^^^^^^^^
File "deepspeed/module_inject/layers.py", line 473, in gather_params
param.shape[1],
~~~~~~~~~~~^^^
IndexError: tuple index out of range

@delock
Copy link
Collaborator

delock commented Apr 28, 2025

Hi @Yejing-Lai can you also take a look at this PR?

@HollowMan6 HollowMan6 changed the title Fix QWen AutoTP when gathering replaced layer params Fix AutoTP gathering replaced layer params when bias is not None Apr 29, 2025
@HollowMan6 HollowMan6 requested a review from inkcherry April 29, 2025 10:06
@inkcherry
Copy link
Contributor

LGTM thanks!

Some params are one-dimentional, this PR adds support
for these params.

```log
with deepspeed.module_inject.layers.GatherReplacedLayerParams([param], model, enabled=True):
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "deepspeed/module_inject/layers.py", line 359, in __enter__
self.params[0].gather_params(self.params)
File "torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
       ^^^^^^^^^^^^^^^^^^^^^
File "deepspeed/module_inject/layers.py", line 473, in gather_params
param.shape[1],
~~~~~~~~~~~^^^
IndexError: tuple index out of range
```

Signed-off-by: Hollow Man <[email protected]>
@HollowMan6
Copy link
Contributor Author

Fixed the formatting issue.

@HollowMan6
Copy link
Contributor Author

CI error seems to be caused by the environment instead of this PR:

ImportError: /opt/conda/envs/ptca/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /scratch/azureml/cr/j/bd50c7f98a144dcd900fbdcd8943d8c5/exe/wd/actions-runner/_work/DeepSpeed/DeepSpeed/tests/./torch-extensions/async_io/async_io.so)

@loadams
Copy link
Collaborator

loadams commented May 9, 2025

CI error seems to be caused by the environment instead of this PR:

ImportError: /opt/conda/envs/ptca/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /scratch/azureml/cr/j/bd50c7f98a144dcd900fbdcd8943d8c5/exe/wd/actions-runner/_work/DeepSpeed/DeepSpeed/tests/./torch-extensions/async_io/async_io.so)

Yes @HollowMan6 - thanks for following up on this PR. This is a known CI issue I am working on and hope to have resolved ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]zero2 + autotp: IndexError: tuple index out of range
5 participants