-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ridiculously long training times for GLOW on gpu #72
Comments
Hi, thanks for the details. Would you have a small script that shows this is issue? While memory is one of the main part of this implementation, the performance should not be this bad so we'd like to make check this out. |
This is an example code (adapted from NetworkGlow.jl example). I have been running on an NVIDIA A100 GPU, I also noticed that the speed of the backward pass is highly dependent on the batch size (more than I would expect it to be). For this piece of code, I get the output below. Thank you!! |
Thank you, looking into it |
The main computational bottleneck you are experience comes from computing the gradient got the 1x1 convolutions. In practice, these are usually skipped with these weights being fixed (@rafaelorozco please add any detail). So in your case if you initialize your network as |
Thank you for looking into it! Now the time taken for a backward pass had gone down significantly. It takes around 12 seconds now on the same machine, which I guess is expected. |
I tried to use the NetworkGlow structure/example to replicate GLOW on CIFAR10. I put 3 scales with 32 steps in each and each convolution as 256 channels. I found that for each backward pass, it took a few minutes. Whilst I do agree that the model is huge, with somewhat around 50 million parameters, but the same model on python takes the same time for an epoch (with 256 batch size). I wonder if this behaviour is expected since invertiblenetworks.jl focuses more on memory-efficient than fast.
The text was updated successfully, but these errors were encountered: