-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add support for TE MXFP8 recipe in accelerate #3688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is outside the initial scope for this PR, but there's some oddity when using Deepspeed + FP8 + the HF Trainer. If you set And if you omit it,
Interestingly it's still possible to use FP8 with deepspeed currently? But it seems like a bug. This check: There have been a number of "FP8 + deepspeed" PRs here in the past, I'm wondering if the cleanest option is to separate "mixed_precision" from fp8. fp8 typically uses bf16 for model weights and between FP8-enabled layers anyways. |
src/accelerate/accelerator.py
Outdated
if ( | ||
AcceleratorState._shared_state != {} | ||
and AcceleratorState().distributed_type == DistributedType.DEEPSPEED | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting only change, not sure why it's changing it from main
edbe9d5
to
46ebf27
Compare
Signed-off-by: Peter St. John <[email protected]>
46ebf27
to
1fb8f76
Compare
Do I understand correctly that this only covers DeepSpeed? |
What does this PR do?
Adds support for the MXFP8 format in TE. See the TE docs pages for more background:
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#MXFP8-and-block-scaling
This adds an additional fp8_recipe argument,
use_mxfp8_block_scaling
, that switches the recipe from theDelayedScaling
recipe to MXFP8BlockScaling.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.