You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
命令行:CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8
WANDB_API_KEY=XXX
swift rlhf
--rlhf_type grpo
--model /nfs/largemodel/wangjuan/outputs/Qwen2.5-7B-110K-sft5-0403/v2-20250403-092425/checkpoint-7370-merged
--train_type lora
--dataset '/nfs/largemodel/wangjuan/data/law_chinese1/DISC-Law-SFT-Pair-QA-released_alpaca.json'
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 1024
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--eval_steps 1000
--save_steps 1000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 5
--output_dir XXX
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 1024
--reward_funcs format repetition
--num_generations 4
--system '用户和助手之间的每一段对话。用户提出一个问题,助手解决它。助手都要先在脑海中思考推理过程,然后再向用户提供答案。推理过程和答案分别被包裹在<think>和</think>以及<answer>和</answer>标签内,即<think>推理过程在此</think><answer>答案在此</answer>'
--use_vllm true
--vllm_gpu_memory_utilization 0.5
--vllm_max_model_len 2048
--deepspeed zero3
--temperature 1.0
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 4
--async_generate false
--move_model_batches 16
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--report_to 'wandb'
--sleep_level 1
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
CUDA11.5,H100 8*80G
Additional context
Add any other context about the problem here(在这里补充其他信息)
The text was updated successfully, but these errors were encountered:
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
命令行:CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8
WANDB_API_KEY=XXX
swift rlhf
--rlhf_type grpo
--model /nfs/largemodel/wangjuan/outputs/Qwen2.5-7B-110K-sft5-0403/v2-20250403-092425/checkpoint-7370-merged
--train_type lora
--dataset '/nfs/largemodel/wangjuan/data/law_chinese1/DISC-Law-SFT-Pair-QA-released_alpaca.json'
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 1024
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--eval_steps 1000
--save_steps 1000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 5
--output_dir XXX
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 1024
--reward_funcs format repetition
--num_generations 4
--system '用户和助手之间的每一段对话。用户提出一个问题,助手解决它。助手都要先在脑海中思考推理过程,然后再向用户提供答案。推理过程和答案分别被包裹在
<think>
和</think>
以及<answer>
和</answer>
标签内,即<think>推理过程在此</think><answer>答案在此</answer>
'--use_vllm true
--vllm_gpu_memory_utilization 0.5
--vllm_max_model_len 2048
--deepspeed zero3
--temperature 1.0
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 4
--async_generate false
--move_model_batches 16
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--report_to 'wandb'
--sleep_level 1
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
CUDA11.5,H100 8*80G
Additional context
Add any other context about the problem here(在这里补充其他信息)
The text was updated successfully, but these errors were encountered: