Support for mutliturn online RL training #385

UbeCc · 2025-02-25T16:44:45Z

Currently, verl only support single-turn rl training. As agents turning is becoming urgent, will verl support multiturn rl in the next few days?
Maybe I can help. Thanks!

@PeterSH6 @zhaochenyang20

zhaochenyang20 · 2025-02-25T16:48:54Z

Sure. Welcome to join. And, proposal should be professional, look at how I make proposal to SGLang:

Reaserch Project 开题

问题是什么，如何清晰定义问题；
这个问题的 scope 如何，有什么非常强的假设，这些假设合理么；
谁在乎这个问题，不要说谷歌会在乎这个问题，要精确到谷歌的某个组甚至是某个人会在乎这个问题；
现有的方案是什么，有什么不足；
我们的方案可能是什么（开题的时候不一定能写完）；
如何评估？达到了什么效果则说明我们的方法奏效了；
这个过程有什么不确定性，风险如何；
计划工作的 timeline 是如何；

Feature 开题

目标框架现在的实现是什么样子的；有什么问题；
修改方案是什么，预计会修改什么部分；
将会达到的效果如何；
有什么不确定性；
计划工作的 timeline；

PeterSH6 · 2025-02-25T16:50:45Z

@UbeCc Nice suggestion!
We can discuss the plan this week.
Could you connect with us through WeChat or Slack?

zhaochenyang20 · 2025-02-25T16:53:35Z

@UbeCc @PeterSH6 I can connect you guys. Haoran is my senior. And, good night 😂

UbeCc · 2025-02-25T17:00:38Z

@UbeCc @PeterSH6 I can connect you guys. Haoran is my senior. And, good night 😂

Thanks Chenyang, enjoy your day!

UbeCc · 2025-02-25T17:01:05Z

@UbeCc Nice suggestion! We can discuss the plan this week. Could you connect with us through WeChat or Slack?

Yeah, let me send my WeChat id through email

YSLIU627 · 2025-02-25T18:24:45Z

Great idea! I could also offer some help!

AIBionics · 2025-02-26T00:16:33Z

I'm also working on multiturn online RL training at the moment, and I'd be glad to assist if you need any help.
Maybe we can create a WeChat group and then add everyone to the group for discussion.

UbeCc · 2025-02-26T00:44:58Z

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

[email protected]

YSLIU627 · 2025-02-26T03:51:51Z

Great! My wechat is liuzhihan0627 . See you then, Best, Zhihan

…

On Tue, Feb 25, 2025 at 6:45 PM Haoran Wang ***@***.***> wrote: Yeah could you plz send your WeChat id to me through email? Let me create a group and work together. ***@***.*** — Reply to this email directly, view it on GitHub <#385 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APK6TFYXJJBZODWCCJZJ3RL2RUFB7AVCNFSM6AAAAABX3D4UF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGYYTCOJZGM> . You are receiving this because you commented.Message ID: ***@***.***> [image: UbeCc]*UbeCc* left a comment (volcengine/verl#385) <#385 (comment)> Yeah could you plz send your WeChat id to me through email? Let me create a group and work together. ***@***.*** — Reply to this email directly, view it on GitHub <#385 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APK6TFYXJJBZODWCCJZJ3RL2RUFB7AVCNFSM6AAAAABX3D4UF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGYYTCOJZGM> . You are receiving this because you commented.Message ID: ***@***.***>

sbl1996 · 2025-02-27T04:06:16Z

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

[email protected]

I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc

zhaochenyang20 · 2025-02-27T09:40:45Z

@sbl1996 Sure. I will tell him tommorow.

UbeCc · 2025-02-27T09:44:46Z

Yeah could you plz send your WeChat id to me through email?
Let me create a group and work together.
[email protected]

I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc

Done. Thank you!

oliverz20 · 2025-02-28T12:28:30Z

Hi, I’m interested in multiturn RL as well. Could you please add me to the group? My WeChat ID is sfoliver. Thanks a lot! @UbeCc

Tshiyao · 2025-03-04T01:21:45Z

Hi @UbeCc , I'm also really into multiturn RL and would love to join the group! My WeChat ID is Liu_Qihuang . Looking forward to connecting and learning more. Thanks!

UbeCc · 2025-03-04T01:26:29Z

Got it. We are already working currently. Thanks for your support

Jackory · 2025-03-07T06:03:31Z

Hi @UbeCc , I'm also interested into multiturn RL and would love to join the group! My WeChat ID is innerpeace . Looking forward to connecting and learning more. Thanks a lot!

hongyi-zhang · 2025-03-12T19:10:50Z

@PeterSH6 @UbeCc @zhaochenyang20
I'm interested in multi-turn RL as well. We have a real-world use case and I was going to start my own implementation before seeing this thread. Would love to contribute or discuss technical design, whichever is preferable!

LeslieTrue · 2025-03-13T06:01:36Z

@UbeCc @PeterSH6 @zhaochenyang20
Interested in contribution! I have a related multi-turn RL implementation but it's not that efficient. My wechat is Tianzhe011127.

quanwei0 · 2025-03-18T05:09:55Z

@UbeCc @PeterSH6 @zhaochenyang20
I am working on multi-step RL training for agents and would like to join the wechat group! My wechat id is weiquan0128. Looking forward to connecting and learning more. Thanks!

UbeCc · 2025-03-18T05:13:16Z

Thank you for your attention! We already have a large group of people working on the feature. Keep syncing if we have any progress!

XuanRen4470 · 2025-03-21T09:43:15Z

same for me. i am also working on multi turn rl . my wechat is x34ren. could you please add me to the group?

waleko · 2025-04-05T09:07:43Z

@UbeCc @PeterSH6 @zhaochenyang20 I’m excited about multi-turn RL and would be glad to join the group. My WeChat is alex-kovrigin — happy to connect and dive deeper into the topic. Thanks!

eric-haibin-lin · 2025-04-06T19:04:49Z

demo: #917

DachengLi1 · 2025-04-06T21:36:05Z

Thank you @eric-haibin-lin! I am curious whether #917 is ready for use?

zhaochenyang20 · 2025-04-06T21:48:32Z

Thank you @eric-haibin-lin! I am curious whether #917 is ready for use?

indeed please ask us 😂 We get good codes ready. But the validation score still does not get improved. In our close-sourced sandbox, it works. But for open-sourced sandbox, it doesn't work right now. We will open-source and merge it anyway in early next week.

…#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture ![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d) **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>

…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture ![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d) **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>

zhaochenyang20 mentioned this issue Mar 4, 2025

Development Roadmap (2025 H1) sgl-project/sglang#4042

Open

61 tasks

sunjin-k mentioned this issue Apr 11, 2025

[WIP] Agent Training with Remote Service + Gym like protocal #973

Draft

wuxibin89 mentioned this issue Apr 17, 2025

[rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout #1138

Merged

5 tasks

Support for mutliturn online RL training #385

Support for mutliturn online RL training #385

Comments

UbeCc commented Feb 25, 2025

zhaochenyang20 commented Feb 25, 2025

Uh oh!

PeterSH6 commented Feb 25, 2025

Uh oh!

zhaochenyang20 commented Feb 25, 2025

Uh oh!

UbeCc commented Feb 25, 2025

Uh oh!

UbeCc commented Feb 25, 2025

Uh oh!

YSLIU627 commented Feb 25, 2025

Uh oh!

AIBionics commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UbeCc commented Feb 26, 2025

Uh oh!

YSLIU627 commented Feb 26, 2025 via email

Uh oh!

sbl1996 commented Feb 27, 2025

Uh oh!

zhaochenyang20 commented Feb 27, 2025

Uh oh!

UbeCc commented Feb 27, 2025

Uh oh!

oliverz20 commented Feb 28, 2025

Uh oh!

Tshiyao commented Mar 4, 2025

Uh oh!

UbeCc commented Mar 4, 2025

Uh oh!

Jackory commented Mar 7, 2025

Uh oh!

hongyi-zhang commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LeslieTrue commented Mar 13, 2025

Uh oh!

quanwei0 commented Mar 18, 2025

Uh oh!

UbeCc commented Mar 18, 2025

Uh oh!

XuanRen4470 commented Mar 21, 2025

Uh oh!

waleko commented Apr 5, 2025

Uh oh!

eric-haibin-lin commented Apr 6, 2025

Uh oh!

DachengLi1 commented Apr 6, 2025

Uh oh!

zhaochenyang20 commented Apr 6, 2025

Uh oh!

AIBionics commented Feb 26, 2025 •

edited

Loading

hongyi-zhang commented Mar 12, 2025 •

edited

Loading