-
Notifications
You must be signed in to change notification settings - Fork 563
[Bug] GPytorch Kernel Partitioning Increasing memory usage #2352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Felice27 we're going to deprecate multi-GPU / kernel partitioning. You should instead try the KeOps integration (https://docs.gpytorch.ai/en/stable/examples/02_Scalable_Exact_GPs/KeOps_GP_Regression.html). KeOps essentially does our kernel partitioning (on a single GPU), but it is far better than our own code in the Simple_MultiGPU_GP_Regression notebook. |
Thank you for the assistance! I'll start updating my code, but I would just have two questions:
|
@Felice27 at the moment KeOps isn't compatible with multi-GPU regression, but it shouldn't be too difficult to accomplish this. Once cornellius-gp/linear_operator#62 is merged into the LinearOperator repo, it shouldn't be too difficult to write a MultiGPU keops kernel. I'm stretched very thin at the moment, so if you'd be up for writing a KeOps MultiGPU kernel and putting up a PR that'd be great. (Wait until cornellius-gp/linear_operator#62 is merged in tho.) |
Alright, thank you for all the assistance! Once that PR is merged, I'll look into accomplishing that. |
Is there any reason that the time per iteration increases significantly as the program continues to run? When I run the training for 50 iterations, each iteration is taking about 25 seconds until about the 20th iteration, at which point the time to run each iteration continues to grow to hundreds of seconds per iteration. I don't think there's an obvious memory leak, as adding in a |
@Felice27 the large-scale GPs use conjugate gradients under the hood (Cholesky won't fit into memory). CG is an iterative algorithm, and the number of iterations required to reach convergence depends on the conditioning of the kernel matrix. Changes to the GP hyperparameters change the conditioning of the kernel matrix, which may cause CG to require more iterations before convergence. |
Closing because checkpointing is now deprecated (as of v1.11) |
Also @Felice27 the cornellius-gp/linear_operator#62 PR is now in (as of v1.11) |
🐛 Bug
I am attempting to fit an exact GP regression on a dataset of ~1 million points; my current code works with 1/10th of the full dataset. When following the steps in example 02 "Simple_MultiGPU_GP_Regression" notebook, I encountered an issue where the memory usage increased after introducing a kernel partition.
To reproduce
** Error message **
Expected Behavior
I expect the memory allocation to decrease by roughly a factor of 2 every time the kernel size decreases, but the attempted memory allocation increases when introducing a kernel partition and remains constant for all nonzero kernel sizes. Is there something I'm missing in order to get the memory usage to decrease properly?
System information
Additional context
Question also posted to https://stackoverflow.com/questions/76335780/why-is-gpytorch-kernel-partition-size-not-reducing-cuda-memory-usage
The text was updated successfully, but these errors were encountered: