You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
cuda12.4 torch2.4 py310 vllm073 swift330dev0 trl0.16.0.dev0
It's not the same issue. Essentially, in a multi-machine and multi-card environment during the GRO training stage, the model's output completion is normal, but the overall loss keeps oscillating around 0, and both KL and gradnorm are constantly 0.
PancakeAwesome
changed the title
多机多卡swift Colocate训练grpo出现loss在0附近震荡
The loss of the multi-machine and multi-card Swift Colocate training group oscillates around 0
Apr 8, 2025
Describe the bug


What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
双机16卡A100 环境下swift grpo colocate模式训练每个node内的所有completion结果都是一样的,reward一直是1, kl和loss一直都是0
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
cuda12.4 torch2.4 py310 vllm073 swift330dev0 trl0.16.0.dev0
NNODES=${WORLD_SIZE:-1}
NODE_RANK=${RANK:-0}
MASTER_ADDR=${MASTER_ADDR:-127.0.0.1}
MASTER_PORT=${MASTER_PORT:-$RANDOM_PORT}
NPROC_PER_NODE=8
swift rlhf
--rlhf_type grpo
--model DeepSeek-R1-Distill-Qwen-32B/
--train_type full
--dataset train.jsonl
--torch_dtype bfloat16
--num_train_epochs 999
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 5e-7
--save_total_limit 2
--logging_steps 1
--eval_steps 20
--save_steps 20
--output_dir /deepseek_distill_qwen_32b_grpo_reward_w_vllm_k8s
--gradient_accumulation_steps 2
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 2048
--reward_funcs accuracy format
--num_generations 8
--use_vllm true
--vllm_gpu_memory_utilization 0.3
--sleep_level 1
--deepspeed zero3_offload
--num_infer_workers 8
--tensor_parallel_size 8
--temperature 1.0
--beta 0.001
--max_grad_norm 1.0
--temperature 0.6
--top_p 0.9
--top_k 50
--repetition_penalty 1.03
--move_model_batches 6
--offload_optimizer true
--offload_model true
--async_generate false
--gc_collect_after_offload true
--model_type deepseek_r1_distill
--log_completions true
--report_to tensorboard
Additional context
Add any other context about the problem here(在这里补充其他信息)
每个机器node之间的completion结果不是一样的
The text was updated successfully, but these errors were encountered: