Skip to content

Support for mutliturn online RL training #385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
UbeCc opened this issue Feb 25, 2025 · 25 comments
Open

Support for mutliturn online RL training #385

UbeCc opened this issue Feb 25, 2025 · 25 comments

Comments

@UbeCc
Copy link

UbeCc commented Feb 25, 2025

Currently, verl only support single-turn rl training. As agents turning is becoming urgent, will verl support multiturn rl in the next few days?
Maybe I can help. Thanks!

@PeterSH6 @zhaochenyang20

@zhaochenyang20
Copy link
Collaborator

Sure. Welcome to join. And, proposal should be professional, look at how I make proposal to SGLang:

Reaserch Project 开题

  1. 问题是什么,如何清晰定义问题;
  2. 这个问题的 scope 如何,有什么非常强的假设,这些假设合理么;
  3. 谁在乎这个问题,不要说谷歌会在乎这个问题,要精确到谷歌的某个组甚至是某个人会在乎这个问题;
  4. 现有的方案是什么,有什么不足;
  5. 我们的方案可能是什么(开题的时候不一定能写完);
  6. 如何评估?达到了什么效果则说明我们的方法奏效了;
  7. 这个过程有什么不确定性,风险如何;
  8. 计划工作的 timeline 是如何;

Feature 开题

  1. 目标框架现在的实现是什么样子的;有什么问题;
  2. 修改方案是什么,预计会修改什么部分;
  3. 将会达到的效果如何;
  4. 有什么不确定性;
  5. 计划工作的 timeline;

@PeterSH6
Copy link
Collaborator

@UbeCc Nice suggestion!
We can discuss the plan this week.
Could you connect with us through WeChat or Slack?

@zhaochenyang20
Copy link
Collaborator

@UbeCc @PeterSH6 I can connect you guys. Haoran is my senior. And, good night 😂

@UbeCc
Copy link
Author

UbeCc commented Feb 25, 2025

@UbeCc @PeterSH6 I can connect you guys. Haoran is my senior. And, good night 😂

Thanks Chenyang, enjoy your day!

@UbeCc
Copy link
Author

UbeCc commented Feb 25, 2025

@UbeCc Nice suggestion! We can discuss the plan this week. Could you connect with us through WeChat or Slack?

Yeah, let me send my WeChat id through email

@YSLIU627
Copy link
Contributor

Great idea! I could also offer some help!

@AIBionics
Copy link

AIBionics commented Feb 26, 2025

I'm also working on multiturn online RL training at the moment, and I'd be glad to assist if you need any help.
Maybe we can create a WeChat group and then add everyone to the group for discussion.

@UbeCc
Copy link
Author

UbeCc commented Feb 26, 2025

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

[email protected]

@YSLIU627
Copy link
Contributor

YSLIU627 commented Feb 26, 2025 via email

@sbl1996
Copy link

sbl1996 commented Feb 27, 2025

Yeah could you plz send your WeChat id to me through email?

Let me create a group and work together.

[email protected]

I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc

@zhaochenyang20
Copy link
Collaborator

@sbl1996 Sure. I will tell him tommorow.

@UbeCc
Copy link
Author

UbeCc commented Feb 27, 2025

Yeah could you plz send your WeChat id to me through email?
Let me create a group and work together.
[email protected]

I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc

Done. Thank you!

@oliverz20
Copy link

Hi, I’m interested in multiturn RL as well. Could you please add me to the group? My WeChat ID is sfoliver. Thanks a lot! @UbeCc

@Tshiyao
Copy link

Tshiyao commented Mar 4, 2025

Hi @UbeCc , I'm also really into multiturn RL and would love to join the group! My WeChat ID is Liu_Qihuang . Looking forward to connecting and learning more. Thanks!

@UbeCc
Copy link
Author

UbeCc commented Mar 4, 2025

Got it. We are already working currently. Thanks for your support

@Jackory
Copy link
Contributor

Jackory commented Mar 7, 2025

Hi @UbeCc , I'm also interested into multiturn RL and would love to join the group! My WeChat ID is innerpeace . Looking forward to connecting and learning more. Thanks a lot!

@hongyi-zhang
Copy link

hongyi-zhang commented Mar 12, 2025

@PeterSH6 @UbeCc @zhaochenyang20
I'm interested in multi-turn RL as well. We have a real-world use case and I was going to start my own implementation before seeing this thread. Would love to contribute or discuss technical design, whichever is preferable!

@LeslieTrue
Copy link

@UbeCc @PeterSH6 @zhaochenyang20
Interested in contribution! I have a related multi-turn RL implementation but it's not that efficient. My wechat is Tianzhe011127.

@quanwei0
Copy link

@UbeCc @PeterSH6 @zhaochenyang20
I am working on multi-step RL training for agents and would like to join the wechat group! My wechat id is weiquan0128. Looking forward to connecting and learning more. Thanks!

@UbeCc
Copy link
Author

UbeCc commented Mar 18, 2025

Thank you for your attention! We already have a large group of people working on the feature. Keep syncing if we have any progress!

@XuanRen4470
Copy link

same for me. i am also working on multi turn rl . my wechat is x34ren. could you please add me to the group?

@waleko
Copy link

waleko commented Apr 5, 2025

@UbeCc @PeterSH6 @zhaochenyang20 I’m excited about multi-turn RL and would be glad to join the group. My WeChat is alex-kovrigin — happy to connect and dive deeper into the topic. Thanks!

@eric-haibin-lin
Copy link
Collaborator

demo: #917

@DachengLi1
Copy link

Thank you @eric-haibin-lin! I am curious whether #917 is ready for use?

@zhaochenyang20
Copy link
Collaborator

Thank you @eric-haibin-lin! I am curious whether #917 is ready for use?

indeed please ask us 😂 We get good codes ready. But the validation score still does not get improved. In our close-sourced sandbox, it works. But for open-sourced sandbox, it doesn't work right now. We will open-source and merge it anyway in early next week.

wuxibin89 added a commit that referenced this issue Apr 25, 2025
…#1138)

### Summary
Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710

### Architecture


![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d)



**New Components**:
- AsyncLLMWorker: standalone vllm server instance
  - FastAPI: provide OpenAI-compatible HTTP server
- AsyncLLM: async LLMEngine for online serving, for more details:
[AsyncLLM](vllm-project/vllm#9826),
[LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine)
- ExternalRayDistributedExecutor: custom executor backend manages
workers in worker group, it grabs corresponding workers by actor names

- AsyncLLManager: manages a group of vllm server
instances(AsyncLLMWorker)
  - AsyncLLM lifecycle: initialization, wake_up, sleep.
  - FastAPI service discovery

- ChatScheduler: schedule multiple chat completion requests with
multiple server instances
  - Least requests load balance
  - Sticky session with prefix caching
  - Chat completion callback: tools calling

### TODO
- [x] AsyncLLM: intialization/wake_up/sleep
- [x] OpenAI API:  support `/v1/chat/completions`
- [x] RayPPOTrainer integration: replace `generate_sequences` to http
call `/v1/chat/completions`
- [x] GSM8K e2e training
- [ ] Add document

---------

Co-authored-by: shengguangming <[email protected]>
ScottCTD pushed a commit to ScottCTD/verl that referenced this issue May 5, 2025
…volcengine#1138)

### Summary
Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710

### Architecture


![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d)



**New Components**:
- AsyncLLMWorker: standalone vllm server instance
  - FastAPI: provide OpenAI-compatible HTTP server
- AsyncLLM: async LLMEngine for online serving, for more details:
[AsyncLLM](vllm-project/vllm#9826),
[LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine)
- ExternalRayDistributedExecutor: custom executor backend manages
workers in worker group, it grabs corresponding workers by actor names

- AsyncLLManager: manages a group of vllm server
instances(AsyncLLMWorker)
  - AsyncLLM lifecycle: initialization, wake_up, sleep.
  - FastAPI service discovery

- ChatScheduler: schedule multiple chat completion requests with
multiple server instances
  - Least requests load balance
  - Sticky session with prefix caching
  - Chat completion callback: tools calling

### TODO
- [x] AsyncLLM: intialization/wake_up/sleep
- [x] OpenAI API:  support `/v1/chat/completions`
- [x] RayPPOTrainer integration: replace `generate_sequences` to http
call `/v1/chat/completions`
- [x] GSM8K e2e training
- [ ] Add document

---------

Co-authored-by: shengguangming <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests