-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support for mutliturn online RL training #385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sure. Welcome to join. And, proposal should be professional, look at how I make proposal to SGLang: Reaserch Project 开题
Feature 开题
|
@UbeCc Nice suggestion! |
Yeah, let me send my WeChat id through email |
Great idea! I could also offer some help! |
I'm also working on multiturn online RL training at the moment, and I'd be glad to assist if you need any help. |
Yeah could you plz send your WeChat id to me through email? Let me create a group and work together. |
Great! My wechat is liuzhihan0627 . See you then,
Best,
Zhihan
…On Tue, Feb 25, 2025 at 6:45 PM Haoran Wang ***@***.***> wrote:
Yeah could you plz send your WeChat id to me through email?
Let me create a group and work together.
***@***.***
—
Reply to this email directly, view it on GitHub
<#385 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APK6TFYXJJBZODWCCJZJ3RL2RUFB7AVCNFSM6AAAAABX3D4UF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGYYTCOJZGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: UbeCc]*UbeCc* left a comment (volcengine/verl#385)
<#385 (comment)>
Yeah could you plz send your WeChat id to me through email?
Let me create a group and work together.
***@***.***
—
Reply to this email directly, view it on GitHub
<#385 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APK6TFYXJJBZODWCCJZJ3RL2RUFB7AVCNFSM6AAAAABX3D4UF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGYYTCOJZGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I've already wrote a multi-turn implementation and it works, but the training is not stable. Could you add me to the wechat group? My wechat id is Zukala-Koth. Great thanks! @UbeCc |
@sbl1996 Sure. I will tell him tommorow. |
Done. Thank you! |
Hi, I’m interested in multiturn RL as well. Could you please add me to the group? My WeChat ID is sfoliver. Thanks a lot! @UbeCc |
Hi @UbeCc , I'm also really into multiturn RL and would love to join the group! My WeChat ID is Liu_Qihuang . Looking forward to connecting and learning more. Thanks! |
Got it. We are already working currently. Thanks for your support |
Hi @UbeCc , I'm also interested into multiturn RL and would love to join the group! My WeChat ID is innerpeace . Looking forward to connecting and learning more. Thanks a lot! |
@PeterSH6 @UbeCc @zhaochenyang20 |
@UbeCc @PeterSH6 @zhaochenyang20 |
@UbeCc @PeterSH6 @zhaochenyang20 |
Thank you for your attention! We already have a large group of people working on the feature. Keep syncing if we have any progress! |
same for me. i am also working on multi turn rl . my wechat is x34ren. could you please add me to the group? |
@UbeCc @PeterSH6 @zhaochenyang20 I’m excited about multi-turn RL and would be glad to join the group. My WeChat is alex-kovrigin — happy to connect and dive deeper into the topic. Thanks! |
demo: #917 |
Thank you @eric-haibin-lin! I am curious whether #917 is ready for use? |
indeed please ask us 😂 We get good codes ready. But the validation score still does not get improved. In our close-sourced sandbox, it works. But for open-sourced sandbox, it doesn't work right now. We will open-source and merge it anyway in early next week. |
…#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>
…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <[email protected]>
Currently, verl only support single-turn rl training. As agents turning is becoming urgent, will verl support multiturn rl in the next few days?
Maybe I can help. Thanks!
@PeterSH6 @zhaochenyang20
The text was updated successfully, but these errors were encountered: