Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The loss of the multi-machine and multi-card Swift Colocate training group oscillates around 0 #3780

Open
PancakeAwesome opened this issue Apr 7, 2025 · 3 comments

Comments

@PancakeAwesome
Copy link

PancakeAwesome commented Apr 7, 2025

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
Image
Image

双机16卡A100 环境下swift grpo colocate模式训练每个node内的所有completion结果都是一样的,reward一直是1, kl和loss一直都是0

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
cuda12.4 torch2.4 py310 vllm073 swift330dev0 trl0.16.0.dev0

NNODES=${WORLD_SIZE:-1}
NODE_RANK=${RANK:-0}
MASTER_ADDR=${MASTER_ADDR:-127.0.0.1}
MASTER_PORT=${MASTER_PORT:-$RANDOM_PORT}
NPROC_PER_NODE=8
swift rlhf
--rlhf_type grpo
--model DeepSeek-R1-Distill-Qwen-32B/
--train_type full
--dataset train.jsonl
--torch_dtype bfloat16
--num_train_epochs 999
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 5e-7
--save_total_limit 2
--logging_steps 1
--eval_steps 20
--save_steps 20
--output_dir /deepseek_distill_qwen_32b_grpo_reward_w_vllm_k8s
--gradient_accumulation_steps 2
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 2048
--reward_funcs accuracy format
--num_generations 8
--use_vllm true
--vllm_gpu_memory_utilization 0.3
--sleep_level 1
--deepspeed zero3_offload
--num_infer_workers 8
--tensor_parallel_size 8
--temperature 1.0
--beta 0.001
--max_grad_norm 1.0
--temperature 0.6
--top_p 0.9
--top_k 50
--repetition_penalty 1.03
--move_model_batches 6
--offload_optimizer true
--offload_model true
--async_generate false
--gc_collect_after_offload true
--model_type deepseek_r1_distill
--log_completions true
--report_to tensorboard

Additional context
Add any other context about the problem here(在这里补充其他信息)
每个机器node之间的completion结果不是一样的

@PancakeAwesome
Copy link
Author

@tastelikefeet

@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 7, 2025

duplicated issue? #3745

@PancakeAwesome
Copy link
Author

duplicated issue? #3745

It's not the same issue. Essentially, in a multi-machine and multi-card environment during the GRO training stage, the model's output completion is normal, but the overall loss keeps oscillating around 0, and both KL and gradnorm are constantly 0.

@PancakeAwesome PancakeAwesome changed the title 多机多卡swift Colocate训练grpo出现loss在0附近震荡 The loss of the multi-machine and multi-card Swift Colocate training group oscillates around 0 Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants