Skip to content

Add notebook with DCGAN on MNIST #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

BenjaminBossan
Copy link
Collaborator

@BenjaminBossan BenjaminBossan commented Feb 8, 2020

@zachbellay I created a notebook that implements DCGAN. It builds upon this example. The notebook can be inspected here.

I have made some slight adjustments to better fit skorch but overall found that I had to change surprisingly little.

However, I would strongly assume that other GANs need many more adjustments, e.g. using different optimizers for generator and discriminator, or not training them in lockstep. For such cases, it's probably best to use pure pytorch.

@BenjaminBossan BenjaminBossan self-assigned this Feb 8, 2020
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@cgarciae
Copy link

cgarciae commented Feb 8, 2020

Hey Benjamin, unless I am understanding something wrong it seems you are training the generator and discriminator at the same time. Normally you are supposed to alternate between training the Generator and the Discriminator which I don't see happening since ALL the losses are added together unconditionally.

@BenjaminBossan
Copy link
Collaborator Author

@cgarciae That's one of the adjustments I spoke of :)

It's true but at least for the given dataset, it seems to still work. Not training in lockstep, which I mentioned above, is not so easily achieved with skorch. You would need to overwrite the fit loop.

@YannDubs
Copy link
Contributor

YannDubs commented Feb 10, 2020

@BenjaminBossan are you sure this is correct:

output_fake = self.discriminator(fake)
...
loss_generator = self.criterion(output_fake, label_real)

It seems to me that you are also backpropagating though the discriminator to make output_fake-label_real close. This does not happen in the pytorch example due to the zero_grad at the begining of both optimization steps. Before seeing this PR, I made a quick suggestion (#295 (comment) ) of wrapping the first line as such:

set_requires_grad(discriminator, False)
output_fake = self.discriminator(fake)
set_requires_grad(discriminator, True)

Where :

def set_requires_grad(module, val):
    """Set all the gradients of a given module to a certain value."""
    for p in module.parameters():
        p.requires_grad = val

Maybe I missed something in you code but backpropagating through discriminator for the generator loss does not seem correct. In addition, what @cgarciae seems very crucial, I don't think that the minimax game converges with joint optimization (but this becomes more hairy with skorch).This notebook is very useful to get a better idea on how to use skorch, but I'm wondering if putting out something like that will not just mislead people to thinking they are running a DCGAN when the main optimization components are actually quite different (despite it working somehow in this case :)).

@BenjaminBossan
Copy link
Collaborator Author

This does not happen in the pytorch example due to the zero_grad at the begining of both optimization steps.

Yes, you're right, this part doesn't easily translate into skorch. Maybe I should have documented where exactly I deviated from the original.

Before seeing this PR, I made a quick suggestion (#295 (comment) ) of wrapping the first line as such:

I'll try this out once I have time. Maybe I'll have another go at this on the weekend where I try to stick closer to the original.

I don't think that the minimax game converges with joint optimization

I wondered how important this really is. At least after a few dozen batches, it's hard for me to imagine that there will be a big difference. But since I've never really worked with GANs, I can only speak from my superficial understanding.

I'm wondering if putting out something like that will not just mislead people to thinking they are running a DCGAN when the main optimization components are actually quite different

My main objective with this draft PR was to get some feedback from more knowledge folks, which has worked so far :) I'll be sure to not merge this if it's incorrect or misleading.

In the end, I was rather astonished that the results still look quite reasonable, which makes me wonder how important the implementation details for this example are. For others datasets and GANs, I'm sure they make a bigger difference.

@BenjaminBossan
Copy link
Collaborator Author

Closed in favor of #593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants