You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a large training scripts for which I have encountered CUDA OOM issues. I was able to narrow down the issue to my particular usage of gpytorch and that using the ExactGP module was leading to memory leaks. The whole training script is too complicated to post here but I think I was able to reduce the bug down to a minimal example.
It seems that even just initializing the ExactGP module leads to memory leaks, even without doing backpropogation or calling the forward function of the module.
To reproduce
I found that the following code snippet already leads to a memory leak. Please let me know if I am using the library incorrectly in any way!
Using PyTorch's memory visualization tool I can observe that the memory for the GP input tensors does not seem to be freed after each iteration. Of course, in this example the memory build-up is negligible but in my actual use case this leads to CUDA OOM issues.
Expected Behavior
If we uncomment the usage of ExactGP in the loss function and instead initialise the BoxModule the memory seems to freed after each iteration as we would expect.
System information
Please complete the following information:
Python version: 3.12
gpytorch.__version__: 1.13
torch.__version__: 2.6.0
torch.version.cuda: 12.4
The text was updated successfully, but these errors were encountered:
🐛 Bug
I have a large training scripts for which I have encountered CUDA OOM issues. I was able to narrow down the issue to my particular usage of
gpytorch
and that using theExactGP
module was leading to memory leaks. The whole training script is too complicated to post here but I think I was able to reduce the bug down to a minimal example.It seems that even just initializing the
ExactGP
module leads to memory leaks, even without doing backpropogation or calling theforward
function of the module.To reproduce
I found that the following code snippet already leads to a memory leak. Please let me know if I am using the library incorrectly in any way!
** Memory profile **
Using PyTorch's memory visualization tool I can observe that the memory for the GP input tensors does not seem to be freed after each iteration. Of course, in this example the memory build-up is negligible but in my actual use case this leads to CUDA OOM issues.
Expected Behavior
If we uncomment the usage of
ExactGP
in theloss
function and instead initialise theBoxModule
the memory seems to freed after each iteration as we would expect.System information
Please complete the following information:
Python version: 3.12
gpytorch.__version__: 1.13
torch.__version__: 2.6.0
torch.version.cuda: 12.4
The text was updated successfully, but these errors were encountered: