Add emulate in float8 and relative checks #1214

mori360 · 2025-05-21T18:32:39Z

Add emulate in float8, to enable test on older hardware.

Change relative warnings

Test result:
Test locally on 8 H100 server.
CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --model.converters="float8" --float8.enable_fsdp_float8_all_gather --float8.precompute_float8_dynamic_scale_for_fsdp --float8.force_recompute_fp8_weight_in_bwd

CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --model.converters="float8" --float8.enable_fsdp_float8_all_gather --float8.precompute_float8_dynamic_scale_for_fsdp --float8.force_recompute_fp8_weight_in_bwd --float8.emulate

tianyu-l

Thanks for working on this! I left some inline comments.

tianyu-l · 2025-05-22T04:47:50Z

.ci/docker/requirements.txt

@@ -8,3 +8,4 @@ tabulate
 wandb
 fsspec
 tyro
+torchao


I think the recommended way of installing torchao is still via nightly, similar to how we install pytorch nightly for CI
https://github.com/pytorch/torchtitan/blob/main/.github/workflows/integration_test_8gpu.yaml#L39
but for torchao
USE_CPP=0 python -m pip install git+https://github.com/pytorch/ao.git

tianyu-l · 2025-05-22T05:00:01Z

torchtitan/components/quantization/float8.py

+                "To enable support on older hardware, set `float8.emulate` to True.",
+            )
+            return
+        elif float8_config.emulate and job_config.training.compile:


I wonder if emulate+compile works on H100? Since the original comment from @vkuzo is

torch.compile with float8 dtypes is not going to work on older hardware, so the emulation can only be used in eager mode.

Will have some tests on it

test to be good, remove this exception

tianyu-l · 2025-05-22T05:02:41Z

torchtitan/config_manager.py

+    Whether to run on earlier hardware in CI test.
+    torch.compile with float8 dtypes is not going to work on older hardware, so the emulation can
+    only be used in eager mode.


Suggested change

Whether to run on earlier hardware in CI test.

torch.compile with float8 dtypes is not going to work on older hardware, so the emulation can

only be used in eager mode.

If True, emulation is used instead of hardware accelerated gemm. This is for test purpose only, as the current CI does have sm_90 capability, required by Float8.

Not compatible with torch.compile.

This is assuming torch.compile+emulate don't work on >= H100 either. If not we'll need to further adjust code and helper message.

fegin · 2025-05-22T16:48:30Z

torchtitan/components/quantization/float8.py

+            return
+        elif float8_config.emulate and job_config.training.compile:
+            logger.warning(
+                "Failed to run on emulate with compile on, please disable compile to allow on emulate.",


We should just raise an exception if the configurations combination is not runnable.

tianyu-l · 2025-05-23T01:59:03Z

torchtitan/components/quantization/float8.py

@@ -26,9 +26,10 @@ def __init__(self, job_config: JobConfig, parallel_dims: ParallelDims):
        self.enabled = False

        float8_config: Float8 = job_config.float8
-        if not has_cuda_capability(8, 9):
+        if not has_cuda_capability(8, 9) and not float8_config.emulate:
            logger.warning(


according to #1214 (comment) we should raise error instead of do this warning

tianyu-l · 2025-05-23T02:03:22Z

torchtitan/config_manager.py

+    If True, emulation is used instead of hardware accelerated gemm. This is for test purpose only,
+    as the current CI does have sm_90 capability, required by Float8.
+    Not compatible with torch.compile.


Suggested change

If True, emulation is used instead of hardware accelerated gemm. This is for test purpose only,

as the current CI does have sm_90 capability, required by Float8.

Not compatible with torch.compile.

If True, emulation is used instead of hardware accelerated gemm. This is for test purpose only,

as the current CI does not have sm_89 capability, required by Float8.

tianyu-l · 2025-05-23T02:08:37Z

torchtitan/components/quantization/float8.py

            logger.warning(
-                "Failed to swap to Float8Linear because float8 is only supported on SM89 or later",
+                "Failed to swap to Float8Linear because float8 is only supported on SM89 or later."
+                "To enable support on older hardware, set `float8.emulate` to True.",


Suggested change

"To enable support on older hardware, set `float8.emulate` to True.",

"To enable testing on older hardware, set `float8.emulate` to True in eager mode.",

tianyu-l · 2025-05-23T02:14:08Z

torchtitan/components/quantization/float8.py

@@ -26,9 +26,10 @@ def __init__(self, job_config: JobConfig, parallel_dims: ParallelDims):
        self.enabled = False

        float8_config: Float8 = job_config.float8
-        if not has_cuda_capability(8, 9):
+        if not has_cuda_capability(8, 9) and not float8_config.emulate:


On sm < 89, we can't enable torch.compile with/without emulate, right? If so let's do

Suggested change

if not has_cuda_capability(8, 9) and not float8_config.emulate:

if not has_cuda_capability(8, 9) and (job_config.training.compile or not float8_config.emulate):

Also it's a bit hard to read. A better way may be

if has_cuda_capability(8, 9) or (float8_config.emulate and not job_config.training.compile): pass else: raise ValueError(...)

tianyu-l

The CPU CI error is because we change warning to exception when sm < 89.
I think we can just add the emulate flag to https://github.com/pytorch/torchtitan/blob/main/tests/unit_tests/test_model_converter.py#L42

add emulate in float 8 and relative checks

0f1ea74

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2025

update node

b0e1427

mori360 changed the title ~~Add emulate in float 8 and relative checks~~ Add emulate in float8 and relative checks May 21, 2025

mori360 added 8 commits May 21, 2025 13:09

add into doc

36d1047

add into doc

3b5d3ef

lint

f62c485

add into test

757b528

update requirement of ao, move test ahead right now

14f8a44

update ao version

5124ede

update requirement

809adc6

move test to the end

e94a80c

mori360 marked this pull request as ready for review May 22, 2025 03:09

mori360 requested review from tianyu-l and vkuzo May 22, 2025 03:09

tianyu-l reviewed May 22, 2025

View reviewed changes

fegin reviewed May 22, 2025

View reviewed changes

mori360 added 3 commits May 22, 2025 10:21

remove one exception and update description

0beac1d

update ci build

ffdd8a8

lint

4ab9baa

mori360 marked this pull request as draft May 22, 2025 17:35

update torchao build

6d6ff24

mori360 marked this pull request as ready for review May 22, 2025 18:41

mori360 requested review from fegin and tianyu-l May 22, 2025 18:41

tianyu-l reviewed May 23, 2025

View reviewed changes

switch to error out and change description

7ddc5c8

mori360 marked this pull request as draft May 23, 2025 17:15

tianyu-l reviewed May 23, 2025

View reviewed changes

add emulate flag in cpu test

217896e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add emulate in float8 and relative checks #1214

Add emulate in float8 and relative checks #1214

Uh oh!

mori360 commented May 21, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

tianyu-l May 22, 2025

Uh oh!

tianyu-l May 22, 2025

Uh oh!

mori360 May 22, 2025 •

edited

Loading

Uh oh!

mori360 May 22, 2025

Uh oh!

tianyu-l May 22, 2025

Uh oh!

fegin May 22, 2025

Uh oh!

tianyu-l May 23, 2025

Uh oh!

tianyu-l May 23, 2025

Uh oh!

tianyu-l May 23, 2025

Uh oh!

tianyu-l May 23, 2025

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

@@ @@ -8,3 +8,4 @@ tabulate @@
               wandb
               fsspec
               tyro
+              torchao

	"To enable support on older hardware, set `float8.emulate` to True.",
	"To enable testing on older hardware, set `float8.emulate` to True in eager mode.",

	if not has_cuda_capability(8, 9) and not float8_config.emulate:
	if not has_cuda_capability(8, 9) and (job_config.training.compile or not float8_config.emulate):

Add emulate in float8 and relative checks #1214

Are you sure you want to change the base?

Add emulate in float8 and relative checks #1214

Uh oh!

Conversation

mori360 commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mori360 May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mori360 commented May 21, 2025 •

edited

Loading

mori360 May 22, 2025 •

edited

Loading