Skip to content

May pin_memory cause some error for training? #5

Open
@fan365

Description

@fan365

when I run the train script, the following error appears:

  • Computing policy gradient: 254/256Parameter device: cuda:0
    Traceback (most recent call last):
    File "/data2/ghxy/project/GRPO-Zero/train.py", line 191, in
    main(args.config)
    File "/data2/ghxy/project/GRPO-Zero/train.py", line 118, in main
    results = update_policy(
    ^^^^^^^^^^^^^^
    File "/data2/ghxy/project/GRPO-Zero/grpo.py", line 217, in update_policy
    optimizer.step()
    File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/optim/optimizer.py", line 493, in wrapper
    out = func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^
    File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^
    File "/data2/ghxy/project/GRPO-Zero/optimizer.py", line 74, in step
    state["exp_avg"] = torch.zeros_like(
    ^^^^^^^^^^^^^^^^^
    File "/home/ghxy/.conda/envs/grpo-test-env/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in torch_function
    return func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^
    RuntimeError: CUDA error: invalid argument
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

but when I set pin_memory=False in MemoryEfficientAdamW, the training process become normal
Step 1, mean_reward: 0.12, train success_rate: 0.04, grad_norm: 0.80, duration: 95.91, num_finished_episodes: 247, mean_response_len: 421.95, entropy: 1.09
Step 2, mean_reward: 0.17, train success_rate: 0.07, grad_norm: 0.64, duration: 88.32, num_finished_episodes: 255, mean_response_len: 339.95, entropy: 1.08
Step 3, mean_reward: 0.21, train success_rate: 0.11, grad_norm: 0.56, duration: 74.16, num_finished_episodes: 256, mean_response_len: 210.63, entropy: 0.90
Step 4, mean_reward: 0.25, train success_rate: 0.15, grad_norm: 0.80, duration: 80.51, num_finished_episodes: 255, mean_response_len: 209.89, entropy: 0.79
Step 5, mean_reward: 0.28, train success_rate: 0.18, grad_norm: 0.74, duration: 79.44, num_finished_episodes: 255, mean_response_len: 193.94, entropy: 0.68

this is my env:
os:
Linux zjd-4090 6.11.0-19-generic #19~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Feb 17 11:51:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

gpu:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 570.86.16 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 51% 65C P0 255W / 450W | 21441MiB / 24564MiB | 66% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

cpu 0:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 183
model name : Intel(R) Core(TM) i9-14900K
stepping : 1
microcode : 0x12c
cpu MHz : 5700.000
cache size : 36864 KB
physical id : 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions