-
Notifications
You must be signed in to change notification settings - Fork 397
Add notebook with DCGAN on MNIST #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
Hey Benjamin, unless I am understanding something wrong it seems you are training the generator and discriminator at the same time. Normally you are supposed to alternate between training the Generator and the Discriminator which I don't see happening since ALL the losses are added together unconditionally. |
@cgarciae That's one of the adjustments I spoke of :) It's true but at least for the given dataset, it seems to still work. Not training in lockstep, which I mentioned above, is not so easily achieved with skorch. You would need to overwrite the fit loop. |
@BenjaminBossan are you sure this is correct: output_fake = self.discriminator(fake)
...
loss_generator = self.criterion(output_fake, label_real) It seems to me that you are also backpropagating though the discriminator to make output_fake-label_real close. This does not happen in the pytorch example due to the set_requires_grad(discriminator, False)
output_fake = self.discriminator(fake)
set_requires_grad(discriminator, True) Where : def set_requires_grad(module, val):
"""Set all the gradients of a given module to a certain value."""
for p in module.parameters():
p.requires_grad = val Maybe I missed something in you code but backpropagating through discriminator for the generator loss does not seem correct. In addition, what @cgarciae seems very crucial, I don't think that the minimax game converges with joint optimization (but this becomes more hairy with skorch).This notebook is very useful to get a better idea on how to use skorch, but I'm wondering if putting out something like that will not just mislead people to thinking they are running a DCGAN when the main optimization components are actually quite different (despite it working somehow in this case :)). |
Yes, you're right, this part doesn't easily translate into skorch. Maybe I should have documented where exactly I deviated from the original.
I'll try this out once I have time. Maybe I'll have another go at this on the weekend where I try to stick closer to the original.
I wondered how important this really is. At least after a few dozen batches, it's hard for me to imagine that there will be a big difference. But since I've never really worked with GANs, I can only speak from my superficial understanding.
My main objective with this draft PR was to get some feedback from more knowledge folks, which has worked so far :) I'll be sure to not merge this if it's incorrect or misleading. In the end, I was rather astonished that the results still look quite reasonable, which makes me wonder how important the implementation details for this example are. For others datasets and GANs, I'm sure they make a bigger difference. |
Closed in favor of #593 |
@zachbellay I created a notebook that implements DCGAN. It builds upon this example. The notebook can be inspected here.
I have made some slight adjustments to better fit skorch but overall found that I had to change surprisingly little.
However, I would strongly assume that other GANs need many more adjustments, e.g. using different optimizers for generator and discriminator, or not training them in lockstep. For such cases, it's probably best to use pure pytorch.