[Bug] tensor_model_parallel_all_reduce' is not defined #2931

bakch92 · 2025-01-17T02:02:26Z

Describe the bug

I attempted to serve the Phi-4 Lora Fine-tuning model by setting tensor parallel size 2 using the sglang framework, but the following error occurred.

[Error Log]

[2025-01-17 01:51:55 TP0] LoRA manager ready.
[2025-01-17 01:51:57 TP1] Load weight end. type=Phi3ForCausalLM, dtype=torch.float16, avail mem=15.70 GB
[2025-01-17 01:52:00 TP1] LoRA manager ready.
[2025-01-17 01:52:00 TP0] Memory pool end. avail mem=39.54 GB
[2025-01-17 01:52:02 TP1] Memory pool end. avail mem=13.43 GB
[2025-01-17 01:52:02 TP1] max_total_num_tokens=16384, max_prefill_tokens=16384, max_running_requests=2049, context_len=16384
[2025-01-17 01:52:02 TP0] max_total_num_tokens=16384, max_prefill_tokens=16384, max_running_requests=2049, context_len=16384
[2025-01-17 01:52:02] INFO:     Started server process [649817]
[2025-01-17 01:52:02] INFO:     Waiting for application startup.
[2025-01-17 01:52:02] INFO:     Application startup complete.
[2025-01-17 01:52:02] INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
[2025-01-17 01:52:03] INFO:     127.0.0.1:47632 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-01-17 01:52:03 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-01-17 01:52:11 TP0] TpModelWorkerClient hit an exception: Traceback (most recent call last):
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 101, in forward_thread_func
    self.forward_thread_func_()
  File "/home/work/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 132, in forward_thread_func_
    logits_output, next_token_ids = self.worker.forward_batch_generation(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 154, in forward_batch_generation
    logits_output = self.model_runner.forward(forward_batch)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 679, in forward
    return self.forward_extend(forward_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 648, in forward_extend
    return self.model.forward(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 337, in forward
    hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 288, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 237, in forward
    hidden_states = self.self_attn(
                    ^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 175, in forward
    output, _ = self.o_proj(attn_output)
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/lora/lora.py", line 248, in forward
    output_ = tensor_model_parallel_all_reduce(output_parallel)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'tensor_model_parallel_all_reduce' is not defined

Reproduction

Model Name: Microsoft Phi-4

nohup python -m sglang.launch_server --model-path /home/work/ai/Microsoft_Phi-4/phi-4_quantized_8bit --lora-paths lora=/home/work/ai/Microsoft_Phi-4/lora_tuning_1221 --port 8001 --mem-fraction-static 0.8 --host 0.0.0.0 --dtype auto --disable-radix-cache --disable-cuda-graph --quantization gptq_marlin --max-total-tokens 16384 --tp 2 &

Environment

Python: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2: CUDA GPU
GPU 0,1,2 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
CUDA Driver Version: 535.54.03
PyTorch: 2.5.1+cu124
sglang: 0.4.0
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.48.0
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.11.8
fastapi: 0.115.5
hf_transfer: 0.1.8
huggingface_hub: 0.27.0
interegular: 0.3.3
modelscope: 1.20.1
orjson: 3.10.12
packaging: 24.2
psutil: 6.1.0
pydantic: 2.10.4
multipart: 0.0.17
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.58.1
anthropic: Module Not Found
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 NIC0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX NODE SYS 1,3,5,7,9,11 1 N/A
GPU1 PIX X NODE SYS 1,3,5,7,9,11 1 N/A
GPU2 NODE NODE X SYS 1,3,5,7,9,11 1 N/A
NIC0 SYS SYS SYS X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_0

ulimit soft: 1048576

The text was updated successfully, but these errors were encountered:

Fridge003 · 2025-01-18T02:09:21Z

Hi, currently Lora doesn't support tensor parallel in SGLang, so please set tp_size to 1 when using Lora.

But we are planning to fix this in the future. You can refer to #2929 to see our progress of developing Lora.

zhaochenyang20 · 2025-01-21T22:10:24Z

Great. Please follow this issue! @Fridge003

aoshen524 · 2025-02-22T22:15:18Z

Will follow through as suggested.

zhaochenyang20 · 2025-02-23T01:19:38Z

@Fridge003 Will someone take this part?

Fridge003 · 2025-02-23T02:04:50Z

@Fridge003 Will someone take this part?
@aoshen524 will take this part

zhaochenyang20 · 2025-02-23T06:39:11Z

Thanks!

bakch92 · 2025-02-24T22:40:17Z

@Fridge003 when this issue commit?

aoshen524 · 2025-03-01T01:26:00Z

@Fridge003 when this issue commit?

Will start to solve it this weekend

bakch92 · 2025-03-03T07:35:56Z

@Fridge003 Thank you :)

Fridge003 · 2025-03-19T05:10:23Z

Hi @bakch92 , tensor parallel is now supported. Please pull the latest main branch and have a try.

bakch92 · 2025-03-21T05:02:14Z

@Fridge003 Thanks :)
I will try that.

Fridge003 mentioned this issue Jan 18, 2025

[Feature] Lora Development Roadmap #2929

Open

16 tasks

Fridge003 added lora bug Something isn't working labels Feb 9, 2025

Fridge003 self-assigned this Feb 20, 2025

zhaochenyang20 closed this as completed Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] tensor_model_parallel_all_reduce' is not defined #2931

[Bug] tensor_model_parallel_all_reduce' is not defined #2931

bakch92 commented Jan 17, 2025 •

edited

Loading

Fridge003 commented Jan 18, 2025

Uh oh!

zhaochenyang20 commented Jan 21, 2025

Uh oh!

aoshen524 commented Feb 22, 2025

Uh oh!

zhaochenyang20 commented Feb 23, 2025

Uh oh!

Fridge003 commented Feb 23, 2025

Uh oh!

zhaochenyang20 commented Feb 23, 2025

Uh oh!

bakch92 commented Feb 24, 2025

Uh oh!

aoshen524 commented Mar 1, 2025

Uh oh!

bakch92 commented Mar 3, 2025

Uh oh!

Fridge003 commented Mar 19, 2025

Uh oh!

bakch92 commented Mar 21, 2025

Uh oh!

[Bug] tensor_model_parallel_all_reduce' is not defined #2931

[Bug] tensor_model_parallel_all_reduce' is not defined #2931

Comments

bakch92 commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the bug

Reproduction

Environment

Fridge003 commented Jan 18, 2025

Uh oh!

zhaochenyang20 commented Jan 21, 2025

Uh oh!

aoshen524 commented Feb 22, 2025

Uh oh!

zhaochenyang20 commented Feb 23, 2025

Uh oh!

Fridge003 commented Feb 23, 2025

Uh oh!

zhaochenyang20 commented Feb 23, 2025

Uh oh!

bakch92 commented Feb 24, 2025

Uh oh!

aoshen524 commented Mar 1, 2025

Uh oh!

bakch92 commented Mar 3, 2025

Uh oh!

Fridge003 commented Mar 19, 2025

Uh oh!

bakch92 commented Mar 21, 2025

Uh oh!

bakch92 commented Jan 17, 2025 •

edited

Loading