May pin_memory cause some error for training?

when I run the train script, the following error appears:
* Computing policy gradient: 254/256Parameter device: cuda:0
Traceback (most recent call last):
  File "/data2/ghxy/project/GRPO-Zero/train.py", line 191, in <module>
    main(args.config)
  File "/data2/ghxy/project/GRPO-Zero/train.py", line 118, in main
    results = update_policy(
              ^^^^^^^^^^^^^^
  File "/data2/ghxy/project/GRPO-Zero/grpo.py", line 217, in update_policy
    optimizer.step()
  File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/optim/optimizer.py", line 493, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data2/ghxy/project/GRPO-Zero/optimizer.py", line 74, in step
    state["exp_avg"] = torch.zeros_like(
                       ^^^^^^^^^^^^^^^^^
  File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: invalid argument
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

but when I set pin_memory=False in MemoryEfficientAdamW, the training process become normal
Step 1, mean_reward: 0.12, train success_rate: 0.04, grad_norm: 0.80, duration: 95.91, num_finished_episodes: 247, mean_response_len: 421.95, entropy: 1.09
Step 2, mean_reward: 0.17, train success_rate: 0.07, grad_norm: 0.64, duration: 88.32, num_finished_episodes: 255, mean_response_len: 339.95, entropy: 1.08
Step 3, mean_reward: 0.21, train success_rate: 0.11, grad_norm: 0.56, duration: 74.16, num_finished_episodes: 256, mean_response_len: 210.63, entropy: 0.90
Step 4, mean_reward: 0.25, train success_rate: 0.15, grad_norm: 0.80, duration: 80.51, num_finished_episodes: 255, mean_response_len: 209.89, entropy: 0.79
Step 5, mean_reward: 0.28, train success_rate: 0.18, grad_norm: 0.74, duration: 79.44, num_finished_episodes: 255, mean_response_len: 193.94, entropy: 0.68

this is my env:
os:
Linux zjd-4090 6.11.0-19-generic #19~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Feb 17 11:51:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

gpu:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16              Driver Version: 570.86.16      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 51%   65C    P0            255W /  450W |   21441MiB /  24564MiB |     66%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

cpu 0:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 183
model name	: Intel(R) Core(TM) i9-14900K
stepping	: 1
microcode	: 0x12c
cpu MHz		: 5700.000
cache size	: 36864 KB
physical id	: 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

May pin_memory cause some error for training? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

May pin_memory cause some error for training? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions