You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
using two nodes ,each node has 8 L40 GPU,each GPU has 48GiB vram;deepseek-ai/deepseek-V3 models file size is about 650GiB
but following command throw errors: vllm serve /root/.cache/huggingface/deepseek-V3/ --tensor-parallel-size 16 --trust-remote-code --max-num-seqs 1 --gpu-memory-utilization 1
(RayWorkerWrapper pid=1401) ERROR 03-12 05:25:24 worker_base.py:581] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 7 has a total capacity of 44.42 GiB of which 661.38 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 42.93 GiB is allocated by PyTorch, and 178.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [repeated 6x across cluster]
(RayWorkerWrapper pid=6466, ip=192.168.10.103) ERROR 03-12 05:25:24 worker_base.py:581] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 44.43 GiB of which 668.31 MiB is free. Process 895761 has 43.77 GiB memory in use. Of the allocated memory 42.93 GiB is allocated by PyTorch, and 178.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [repeated 7x across cluster]
[rank0]:[W312 05:25:25.954336469 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
using two nodes ,each node has 8 L40 GPU,each GPU has 48GiB vram;deepseek-ai/deepseek-V3 models file size is about 650GiB
but following command throw errors:
vllm serve /root/.cache/huggingface/deepseek-V3/ --tensor-parallel-size 16 --trust-remote-code --max-num-seqs 1 --gpu-memory-utilization 1
am i missing somthing?
Beta Was this translation helpful? Give feedback.
All reactions