Support multi-turn rollout #398

casper-hansen · 2025-02-26T09:37:26Z

I found a fork that attempts to implement multi-turn rollouts using vLLM. I think this would be generally very useful to create reasoning models that can reason over multiple turns in a conversation.

https://github.com/cfpark00/verl/tree/multi_turn_rollout

Jackory · 2025-03-11T03:54:27Z

Nice implementation! I see you add additional multi-turn games. Is there any scripts we can run your code?

…#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture ![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d) **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>

…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture ![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d) **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>

casper-hansen mentioned this issue Mar 4, 2025

[Question] Does verl support muilti-round conversation RL training? #458

Open

wuxibin89 mentioned this issue Apr 17, 2025

[rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout #1138

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multi-turn rollout #398

Support multi-turn rollout #398

casper-hansen commented Feb 26, 2025

Jackory commented Mar 11, 2025

Uh oh!

Support multi-turn rollout #398

Support multi-turn rollout #398

Comments

casper-hansen commented Feb 26, 2025

Jackory commented Mar 11, 2025

Uh oh!