-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: vLLM should prevent setting max_model_len < local attention size for Llama-4 models
bug
Something isn't working
#16274
opened Apr 8, 2025 by
eldarkurtic
1 task done
[Bug]: invalid responses when generating yaml format
bug
Something isn't working
#16269
opened Apr 8, 2025 by
Glebbot
1 task done
[Bug]: Not supporting CUDA12.8
bug
Something isn't working
#16267
opened Apr 8, 2025 by
liurui416
1 task done
[Performance]: H100 Optimisation Configuration For Offline Inferencing
performance
Performance-related issues
#16265
opened Apr 8, 2025 by
mohanajuhi166
2 tasks done
[Feature]: ray logs too large
feature request
New feature or request
#16262
opened Apr 8, 2025 by
ErykCh
1 task done
[Performance]: FP8 does not demonstrate an inference speed superior to that of FP16
performance
Performance-related issues
#16261
opened Apr 8, 2025 by
Shuai-Xie
1 task done
[Feature]: Will you add padding for intermediate_size just like lmdeploy?
feature request
New feature or request
#16260
opened Apr 8, 2025 by
Einsturing
1 task done
[Bug]: vLLM still runs after Ray workers crash
bug
Something isn't working
#16259
opened Apr 8, 2025 by
ccdumitrascu
1 task done
[Usage]: The performance of ngram speculative decoding
usage
How to use vllm
#16258
opened Apr 8, 2025 by
dtransposed
1 task done
[Bug]: Problem Load llama3.2-11B-Vision-Instruct-INT4-GPTQ
bug
Something isn't working
#16254
opened Apr 8, 2025 by
fahadh4ilyas
1 task done
[Usage]: How to use xPyD disaggregated prefilling
usage
How to use vllm
#16253
opened Apr 8, 2025 by
leoyuppieqnew
1 task done
[Usage]: Async generate with offline LLM interface
usage
How to use vllm
#16251
opened Apr 8, 2025 by
SparkJiao
1 task done
[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1
usage
How to use vllm
#16250
opened Apr 8, 2025 by
FanYaning
[Performance]: qwen2.5vl preprocess videos very slow after several batches
performance
Performance-related issues
#16249
opened Apr 8, 2025 by
Zooy138
1 task done
[Bug]: OPEA/Mistral-Small-3.1-24B-Instruct-2503-int4-AutoRound-awq-sym, VLLM Chat error :- can only concatenate str (not "list") to str
bug
Something isn't working
#16245
opened Apr 8, 2025 by
Karan-i3
1 task done
[New Model]: efficient-speech/lite-whisper-large-v3
#16244
opened Apr 8, 2025 by
JakubCerven
1 task done
[Usage]: Failed to get global TPU topology.
usage
How to use vllm
#16243
opened Apr 8, 2025 by
adityarajsahu
1 task done
[Usage]: ERROR:root:Compiled DAG task exited with exception
usage
How to use vllm
#16242
opened Apr 8, 2025 by
vrascal
1 task done
[Bug]: LLM.beam_search Doesn't Pass Multimodal Data
bug
Something isn't working
#16240
opened Apr 8, 2025 by
alex-jw-brooks
1 task done
[Bug]: how to use tests/distributed/test_custom_all_reduce.py
bug
Something isn't working
#16238
opened Apr 8, 2025 by
zhink
1 task done
[Bug]: Calling /wake_up after /sleep and then sending a request leads to improper LLM response
bug
Something isn't working
#16234
opened Apr 8, 2025 by
akshayqylis
1 task done
[Usage]: Multiple Models on Same Port
usage
How to use vllm
#16232
opened Apr 8, 2025 by
dipta007
1 task done
[Feature]: Support Pipeline Parallelism on Llama-4-Maverick-17B-128E
feature request
New feature or request
#16231
opened Apr 8, 2025 by
Edwinhr716
1 task done
[Bug]: failed to load deepseek-r1 AWQ quantization on CPU
bug
Something isn't working
#16230
opened Apr 8, 2025 by
spaceater
1 task done
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.