-
Notifications
You must be signed in to change notification settings - Fork 28
test: Limiting multi-gpu tests to use Ray as distributed_executor_backend #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -62,7 +62,8 @@ model_json=$(cat <<EOF | |||
"enforce_eager": "true", | |||
"enable_lora": "true", | |||
"max_lora_rank": 32, | |||
"lora_extra_vocab_size": 256 | |||
"lora_extra_vocab_size": 256, | |||
"distributed_executor_backend":"ray" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For python native multiprocessing mode and KIND_MODEL setting, Triton hits "failed to stop server: Internal - Exit timeout expired.
Few questions on this for my own understanding moving forward:
- Do we know more details or limitations on why this is happening?
- Is this an error happening on server shutdown?
- Is there some issue with the
python native multiprocessing
due to the details of Triton's python backend launching each instance as a separate process?
- Is there some issue with the
- Is this with 1, 2, or any amount of model instances with KIND_MODEL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know more details or limitations on why this is happening?
Issue with unclear multi-gpu test fail when upgrading to 0.5.0 versions and up
In PR #5230 vllm changed default executor for distributed serving from Ray to python native multiprocessing for single node processing. This becomes in issue starting with v0.5.1 release.
For python native multiprocessing
mode and KIND_MODEL setting, triton hits "failed to stop server: Internal - Exit timeout expired. Exiting immediately." and pt_main_thread processes are never stopped/killed.
Solution: add "distributed_executor_backend":"ray"
to model.json
Is this an error happening on server shutdown?
Yes, and I have a reproducer outside of triton
Is this with 1, 2, or any amount of model instances with KIND_MODEL?
If "distributed_executor_backend" field is not specified, than for tp>2 and distributed among a single node, than MP
backend kicks in. However, I've noticed that even when tp=1 and "distributed_executor_backend" is specified in model.json
, vllm will go through distributed serving even when tp=1. More on the slack channel for this behavior
@oandreeva-nv afaik you can setup the --distributed-executor-backend to ray and avoid the usage of MP. From the docs of distributed serving: |
@rcarrata That's what I'm doing in this PR. I'm making sure that ray is used for distributed testing. Or did I misunderstood your comment? |
In PR #5230 vllm changed default executor for distributed serving from Ray to
python native multiprocessing
for single node processing. This becomes in issue to Triton starting with v0.5.1 release.For
python native multiprocessing
mode and KIND_MODEL setting, Triton hits "failed to stop server: Internal - Exit timeout expired. Exiting immediately." andpt_main_thread
processes are never stopped/killed. I'll create an issue a bit later.Solution: support only Ray for deploying models with tensor_parallel_size > 1 via "distributed_executor_backend" flag until the issue is fixed.
This PR adjusts our multi-gpu tests according to the above observations