Open
Description
When running the grouped gemm implementation and expert parallelism, i am faced with the following error:
[rank5]: File "/env/lib/python3.11/site-packages/megablocks-0.8.0.dev0-py3.11-linux-x86_64.egg/megablocks/layers/glu.py", line 255, in forward
[rank5]: x1 = gg.ops.gmm(x, w1, batch_sizes, trans_b=True)
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/env/lib/python3.11/site-packages/grouped_gemm-0.1.6-py3.11-linux-x86_64.egg/grouped_gemm/ops.py", line 33, in gmm
[rank5]: return GroupedGemm.apply(a, b, batch_sizes, trans_b)
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/env/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply
[rank5]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/env/lib/python3.11/site-packages/grouped_gemm-0.1.6-py3.11-linux-x86_64.egg/grouped_gemm/ops.py", line 11, in forward
[rank5]: return backend.gmm(a, b, batch_sizes, trans_a=False, trans_b=trans_b)
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/env/lib/python3.11/site-packages/grouped_gemm-0.1.6-py3.11-linux-x86_64.egg/grouped_gemm/backend.py", line 27, in gmm
[rank5]: backend.gmm(a, b, c, batch_sizes, trans_a, trans_b)
[rank5]: RuntimeError: Grouped GEMM execution not possible with HW
this only happens when you combine the two. using either alone works fine. setup here is 8xh100.
Metadata
Metadata
Assignees
Labels
No labels