Change training step to a scalar tensor so it works with CUDA graphs #842

jasooney23 · 2025-04-08T02:45:53Z

jasooney23
Apr 8, 2025

I was experimenting with the custom aggregator in the Turbulent Channel example and wanted to enable CUDA graphs for faster execution. However, currently step gets passed as a generic int from Trainer._cuda_graph_training_step, which means that when the CUDA graph gets captured, the step it was captured at is the step the graph will always execute using.

i.e., if my aggregator's forward takes step as an argument and the CUDA graph is captured at step = 20, then the aggregator will continue to execute with step = 20.

My simple fix is just to pass step as a Tensor, but i'm not sure if i should submit the change myself or just let someone bundle it as part of a bigger revision? (sorry, it's my first time participating in open source stuff!)

Thanks 😎

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change training step to a scalar tensor so it works with CUDA graphs #842

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Change training step to a scalar tensor so it works with CUDA graphs #842

Uh oh!

jasooney23 Apr 8, 2025

Replies: 0 comments

jasooney23
Apr 8, 2025