-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Support multi-turn rollout #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Nice implementation! I see you add additional multi-turn games. Is there any scripts we can run your code? |
5 tasks
wuxibin89
added a commit
that referenced
this issue
Apr 25, 2025
…#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>
ScottCTD
pushed a commit
to ScottCTD/verl
that referenced
this issue
May 5, 2025
…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>
GitMonkey0
pushed a commit
to GitMonkey0/verl
that referenced
this issue
Jun 14, 2025
…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I found a fork that attempts to implement multi-turn rollouts using vLLM. I think this would be generally very useful to create reasoning models that can reason over multiple turns in a conversation.
https://github.com/cfpark00/verl/tree/multi_turn_rollout
The text was updated successfully, but these errors were encountered: