when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan. #1581

supermancmk · 2025-05-19T10:02:34Z

I pulled the latest version of verl's code and when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes for training and after a fixed number of steps, grad_norm and kl loss skyrocket and training and testing rewards drop dramatically to 0 and training appears Nan.
Any solution would be greatly appreciated.
Here is my wandb log.

我拉取了verl最新版本的代码，当跑官方gsm8k with tool, multi turn async rollout sglang 的例子时，没有进行任何修改，模型会训练崩溃，到一个固定步数后，grad_norm和kl loss会急剧上升，训练和测试reward会急剧下降到0，训练出现Nan。
请问有什么解决办法，非常感谢。
下面是我wandb日志。

630bdd · 2025-05-19T11:29:19Z

I'm having the same problem.

dawson-chen · 2025-05-19T12:58:41Z

same bug when training a search agent using my custom scheduler with async-vllm implementation. async really fxxks me up

chenhaiq · 2025-05-20T01:55:04Z

Are you using this script: examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn_4xgpu.sh？

supermancmk · 2025-05-20T03:44:45Z

I use this script: examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh . I use 4 nodes or 1 node, 8 gpu per node, but the training crash is also the same.

wuxibin89 · 2025-05-21T06:19:19Z

@dawson-chen Can you try disable vLLM's prefix caching?

diff --git a/verl/workers/rollout/vllm_rollout/vllm_async_server.py b/verl/workers/rollout/vllm_rollout/vllm_async_server.py
index 4f8109e..3d6f612 100644
--- a/verl/workers/rollout/vllm_rollout/vllm_async_server.py
+++ b/verl/workers/rollout/vllm_rollout/vllm_async_server.py
@@ -178,7 +178,7 @@ class AsyncvLLMServer(AsyncServerBase):
             disable_log_stats=config.disable_log_stats,
             max_num_batched_tokens=max_num_batched_tokens,
             enable_chunked_prefill=config.enable_chunked_prefill,
-            enable_prefix_caching=True,
+            enable_prefix_caching=False,
             trust_remote_code=trust_remote_code,
             seed=self.vllm_dp_rank,
         )

dawson-chen · 2025-05-21T06:40:22Z

Thanks @wuxibin89 , I'll give it a try later. My current vLLM version is 0.8.3— should I switch to a newer version?

SwordFaith · 2025-05-21T12:21:37Z

Could you help revert the format to chatml and rerun it? There might be some discrepancies between the shared WandB log and the current main settings. Your assistance would be greatly appreciated.

supermancmk · 2025-05-21T13:30:55Z

Sorry I'm not quite sure how to do it. I used chatml format for training.

SwordFaith · 2025-05-22T03:16:35Z

After reproducing effort done by @zyzshishui , we noticed advantage/max 0 in our currently script at main, which may cause instability in training. And it seems more stable with train bsz & ppo_mini bsz from 256 -> 512 to avoid total batch solve all. Can you check if that works for you ?

advantage/max 0:

new bsz 512 rollout n 8 wandb:
https://wandb.ai/zhaochenyang20/gsm8k_async_rl/runs/2biev775?nw=nwuserzhaochenyang20

supermancmk · 2025-05-22T12:44:33Z

Thank you very much for your reply, I re-pulled the latest version of verl code and set bsz & ppo_mini bsz to 512, rollout.n to 16 and left the rest unchanged. However, when the training reached 100 steps, the model started to slowly crash and the reward decreased from 90% to 10% in one go with a NAN situation. Here is my environment and training commands, and my wandb logs

非常感谢你的回复，我重新拉取最新版本的verl代码，并将bsz & ppo_mini bsz设置成512，rollout.n设置成16，其余没有改变。但当训练到100步时，模型开始慢慢崩溃，reward从90%一下降低到10%，出现NAN情况。下面是我的环境和训练命令，以及我的wandb 链接和日志

Below is my Install env

conda create -n verl python==3.10 -y
conda activate verl
cd /root/verl_0522
pip install torch torchvision
pip install flash-attn --no-build-isolation
pip install -e .[vllm]
pip install -e .[sglang]
pip install math_verify json5
pip install -U "ray[default]"

Below is My command

set -x

ulimit -n 65535

PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
HOME_DIR=/root/verl_0522
python -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo' \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=512 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=/root/Qwen2.5-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=512 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=32 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=sglang_async \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.rollout.n=16 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=32 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='gsm8k_async_rl_debug_verl' \
    trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16-4nodes' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=4 \
    trainer.save_freq=-1 \
    trainer.test_freq=20 \
    data.train_files=$HOME_DIR/data/gsm8k_verl_sgl_multi_turn_preprocessed/train.parquet \
    data.val_files=$HOME_DIR/data/gsm8k_verl_sgl_multi_turn_preprocessed/test.parquet \
    actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
    trainer.total_epochs=15 \
    trainer.val_before_train=True $@

Below is my wandb log:

wandb link: https://wandb.ai/luohaipeng12/gsm8k_async_rl_debug_verl?nw=nwuserluohaipeng12

dawson-chen · 2025-05-24T06:48:53Z

Hi @wuxibin89 , following your suggestions, I conducted 3 controlled experiments to investigate the training crashes.

Environment Setup

vLLM version: 0.8.3
verl version: async-vllm [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout #1138
Custom scheduler: Similar implementation to your ToolChatCompletionScheduler class, with additional search tool calling functionality

Experimental Results

I ran three test configurations:

Green line: Using vLLM prefix cache
Red line: Without vLLM prefix cache
Orange line: Without vLLM prefix cache + max response length extended from 10k to 20k tokens (continued training from step 120 of the Red line)

The results show that disabling vLLM prefix cache do delays the crashes, but doesn't prevent them entirely. All crashes exhibit the same behavior pattern: the model suddenly begins generating repetitive output.

lebronjamesking · 2025-05-27T12:22:17Z

I think the verl sglang multi-turn tool calling is working btw. https://github.com/volcengine/verl/blob/54b2677/examples/sglang_multiturn/README.md

supermancmk · 2025-05-27T13:15:02Z

May I ask if you train normally? How many steps did you train and can you share your training log? I usually collapse at the end of my training, usually about 100 to 200 steps. But it's normal in the early stage.
Thanks

yuleiqin · 2025-05-29T08:44:58Z

Did you try 32GPUs (4 nodes x 8GPUs per node) for 32B Qwen2.5 model? I always failed at the beginning; But for 7B and 3B models, everything went smoothly for at least 100 steps.

Sglang: v0.4.6-post5
verl: 0.3.1-dev
@supermancmk

yuleiqin · 2025-05-29T08:51:01Z

https://api.wandb.ai/links/yuleiqin-tencent/tk23kwpp

This is my training curve @supermancmk

eshoyuan mentioned this issue May 21, 2025

Unexpected performance degradation on GSM8K when multi-turn/tool use is enabled compared to disabled #1569

Open

dawson-chen mentioned this issue May 27, 2025

关于图5a中的一点疑问 Zillwang/StepSearch#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan. #1581

when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan. #1581

supermancmk commented May 19, 2025

630bdd commented May 19, 2025

Uh oh!

dawson-chen commented May 19, 2025

Uh oh!

chenhaiq commented May 20, 2025

Uh oh!

supermancmk commented May 20, 2025 •

edited

Loading

Uh oh!

wuxibin89 commented May 21, 2025

Uh oh!

dawson-chen commented May 21, 2025

Uh oh!

SwordFaith commented May 21, 2025

Uh oh!

supermancmk commented May 21, 2025

Uh oh!

SwordFaith commented May 22, 2025 •

edited

Loading

Uh oh!

supermancmk commented May 22, 2025 •

edited

Loading

Uh oh!

dawson-chen commented May 24, 2025

Uh oh!

lebronjamesking commented May 27, 2025

Uh oh!

supermancmk commented May 27, 2025

Uh oh!

yuleiqin commented May 29, 2025 •

edited

Loading

Uh oh!

yuleiqin commented May 29, 2025

Uh oh!

when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan. #1581

when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan. #1581

Comments

supermancmk commented May 19, 2025

630bdd commented May 19, 2025

Uh oh!

dawson-chen commented May 19, 2025

Uh oh!

chenhaiq commented May 20, 2025

Uh oh!

supermancmk commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wuxibin89 commented May 21, 2025

Uh oh!

dawson-chen commented May 21, 2025

Uh oh!

SwordFaith commented May 21, 2025

Uh oh!

supermancmk commented May 21, 2025

Uh oh!

SwordFaith commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

supermancmk commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dawson-chen commented May 24, 2025

Environment Setup

Experimental Results

Uh oh!

lebronjamesking commented May 27, 2025

Uh oh!

supermancmk commented May 27, 2025

Uh oh!

yuleiqin commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuleiqin commented May 29, 2025

Uh oh!

supermancmk commented May 20, 2025 •

edited

Loading

SwordFaith commented May 22, 2025 •

edited

Loading

supermancmk commented May 22, 2025 •

edited

Loading

yuleiqin commented May 29, 2025 •

edited

Loading