Skip to content

amp_C undefined symbol after installing Megablocks #157

Open
@RachitBansal

Description

@RachitBansal

I am trying to setup and use megablocks to train MoE models, but I see the following error:

Traceback (most recent call last):
  File "/n/holyscratch01/dam_lab/brachit/moes/megablocks/third_party/Megatron-LM/pretrain_gpt.py", line 8, in <module>
    from megatron import get_args
  File "/n/holyscratch01/dam_lab/brachit/moes/megablocks/third_party/Megatron-LM/megatron/__init__.py", line 13, in <module>
    from .initialize  import initialize_megatron
  File "/n/holyscratch01/dam_lab/brachit/moes/megablocks/third_party/Megatron-LM/megatron/initialize.py", line 19, in <module>
    from megatron.checkpointing import load_args_from_checkpoint
  File "/n/holyscratch01/dam_lab/brachit/moes/megablocks/third_party/Megatron-LM/megatron/checkpointing.py", line 15, in <module>
    from .utils import (unwrap_model,
  File "/n/holyscratch01/dam_lab/brachit/moes/megablocks/third_party/Megatron-LM/megatron/utils.py", line 11, in <module>
    import amp_C
ImportError: /usr/local/lib/python3.10/dist-packages/amp_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I am working on NGC's nvcr.io/nvidia/pytorch:23.09-py3 PyTorch container.

When I try running gpt2 training (using exp/gpt2/gpt2_gpt2_46m_1gpu.sh) before doing a pip install megablocks, it works totally fine, while the moe script (exp/moe/moe_125m_8gpu_interactive.sh) gives the error Megablocks not available.

However, after I do a pip install megablocks or pip install . in the container, even the gpt2 script (and the MoE one) starts giving the above error regarding amp_C and undefined symbol.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions