Description
when I run the train script, the following error appears:
- Computing policy gradient: 254/256Parameter device: cuda:0
Traceback (most recent call last):
File "/data2/ghxy/project/GRPO-Zero/train.py", line 191, in
main(args.config)
File "/data2/ghxy/project/GRPO-Zero/train.py", line 118, in main
results = update_policy(
^^^^^^^^^^^^^^
File "/data2/ghxy/project/GRPO-Zero/grpo.py", line 217, in update_policy
optimizer.step()
File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/optim/optimizer.py", line 493, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data2/ghxy/project/GRPO-Zero/optimizer.py", line 74, in step
state["exp_avg"] = torch.zeros_like(
^^^^^^^^^^^^^^^^^
File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in torch_function
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: invalid argument
Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
but when I set pin_memory=False in MemoryEfficientAdamW, the training process become normal
Step 1, mean_reward: 0.12, train success_rate: 0.04, grad_norm: 0.80, duration: 95.91, num_finished_episodes: 247, mean_response_len: 421.95, entropy: 1.09
Step 2, mean_reward: 0.17, train success_rate: 0.07, grad_norm: 0.64, duration: 88.32, num_finished_episodes: 255, mean_response_len: 339.95, entropy: 1.08
Step 3, mean_reward: 0.21, train success_rate: 0.11, grad_norm: 0.56, duration: 74.16, num_finished_episodes: 256, mean_response_len: 210.63, entropy: 0.90
Step 4, mean_reward: 0.25, train success_rate: 0.15, grad_norm: 0.80, duration: 80.51, num_finished_episodes: 255, mean_response_len: 209.89, entropy: 0.79
Step 5, mean_reward: 0.28, train success_rate: 0.18, grad_norm: 0.74, duration: 79.44, num_finished_episodes: 255, mean_response_len: 193.94, entropy: 0.68
this is my env:
os:
Linux zjd-4090 6.11.0-19-generic #19~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Feb 17 11:51:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
gpu:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 570.86.16 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 51% 65C P0 255W / 450W | 21441MiB / 24564MiB | 66% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
cpu 0:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 183
model name : Intel(R) Core(TM) i9-14900K
stepping : 1
microcode : 0x12c
cpu MHz : 5700.000
cache size : 36864 KB
physical id : 0