-
Notifications
You must be signed in to change notification settings - Fork 1.2k
when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes and appears Nan. #1581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm having the same problem. |
Are you using this script: examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn_4xgpu.sh? |
@dawson-chen Can you try disable vLLM's prefix caching? diff --git a/verl/workers/rollout/vllm_rollout/vllm_async_server.py b/verl/workers/rollout/vllm_rollout/vllm_async_server.py
index 4f8109e..3d6f612 100644
--- a/verl/workers/rollout/vllm_rollout/vllm_async_server.py
+++ b/verl/workers/rollout/vllm_rollout/vllm_async_server.py
@@ -178,7 +178,7 @@ class AsyncvLLMServer(AsyncServerBase):
disable_log_stats=config.disable_log_stats,
max_num_batched_tokens=max_num_batched_tokens,
enable_chunked_prefill=config.enable_chunked_prefill,
- enable_prefix_caching=True,
+ enable_prefix_caching=False,
trust_remote_code=trust_remote_code,
seed=self.vllm_dp_rank,
) |
Thanks @wuxibin89 , I'll give it a try later. My current vLLM version is 0.8.3— should I switch to a newer version? |
Could you help revert the format to |
After reproducing effort done by @zyzshishui , we noticed advantage/max 0 in our currently script at main, which may cause instability in training. And it seems more stable with train bsz & ppo_mini bsz from 256 -> 512 to avoid total batch solve all. Can you check if that works for you ? advantage/max 0: ![]() new bsz 512 rollout n 8 wandb: |
Thank you very much for your reply, I re-pulled the latest version of verl code and set bsz & ppo_mini bsz to 512, rollout.n to 16 and left the rest unchanged. However, when the training reached 100 steps, the model started to slowly crash and the reward decreased from 90% to 10% in one go with a NAN situation. Here is my environment and training commands, and my wandb logs 非常感谢你的回复,我重新拉取最新版本的verl代码,并将bsz & ppo_mini bsz设置成512,rollout.n设置成16,其余没有改变。但当训练到100步时,模型开始慢慢崩溃,reward从90%一下降低到10%,出现NAN情况。下面是我的环境和训练命令,以及我的wandb 链接和日志
conda create -n verl python==3.10 -y
conda activate verl
cd /root/verl_0522
pip install torch torchvision
pip install flash-attn --no-build-isolation
pip install -e .[vllm]
pip install -e .[sglang]
pip install math_verify json5
pip install -U "ray[default]"
set -x
ulimit -n 65535
PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
HOME_DIR=/root/verl_0522
python -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='gsm8k_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.train_batch_size=512 \
data.max_prompt_length=1024 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
actor_rollout_ref.model.path=/root/Qwen2.5-3B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=512 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=32 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=sglang_async \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.rollout.n=16 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=32 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=['console','wandb'] \
trainer.project_name='gsm8k_async_rl_debug_verl' \
trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16-4nodes' \
trainer.n_gpus_per_node=8 \
trainer.nnodes=4 \
trainer.save_freq=-1 \
trainer.test_freq=20 \
data.train_files=$HOME_DIR/data/gsm8k_verl_sgl_multi_turn_preprocessed/train.parquet \
data.val_files=$HOME_DIR/data/gsm8k_verl_sgl_multi_turn_preprocessed/test.parquet \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
trainer.total_epochs=15 \
trainer.val_before_train=True $@
wandb link: https://wandb.ai/luohaipeng12/gsm8k_async_rl_debug_verl?nw=nwuserluohaipeng12 |
Hi @wuxibin89 , following your suggestions, I conducted 3 controlled experiments to investigate the training crashes. Environment Setup
Experimental ResultsI ran three test configurations:
![]() The results show that disabling vLLM prefix cache do delays the crashes, but doesn't prevent them entirely. All crashes exhibit the same behavior pattern: the model suddenly begins generating repetitive output.
|
I think the verl sglang multi-turn tool calling is working btw. https://github.com/volcengine/verl/blob/54b2677/examples/sglang_multiturn/README.md |
May I ask if you train normally? How many steps did you train and can you share your training log? I usually collapse at the end of my training, usually about 100 to 200 steps. But it's normal in the early stage. |
Did you try 32GPUs (4 nodes x 8GPUs per node) for 32B Qwen2.5 model? I always failed at the beginning; But for 7B and 3B models, everything went smoothly for at least 100 steps. Sglang: v0.4.6-post5 |
https://api.wandb.ai/links/yuleiqin-tencent/tk23kwpp This is my training curve @supermancmk |
I pulled the latest version of verl's code and when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes for training and after a fixed number of steps, grad_norm and kl loss skyrocket and training and testing rewards drop dramatically to 0 and training appears Nan.
Any solution would be greatly appreciated.
Here is my wandb log.
我拉取了verl最新版本的代码,当跑官方gsm8k with tool, multi turn async rollout sglang 的例子时,没有进行任何修改,模型会训练崩溃,到一个固定步数后,grad_norm和kl loss会急剧上升,训练和测试reward会急剧下降到0,训练出现Nan。
请问有什么解决办法,非常感谢。
下面是我wandb日志。
The text was updated successfully, but these errors were encountered: