Discriminator is unable to adapt to high loss. #2921

VirtualNonsense · 2025-03-18T09:47:13Z

VirtualNonsense
Mar 18, 2025

I'm currently trying to implement a pix2pix cgan to translate images into sketches (repository).
I tried setting up the network according to this keras tutorial however I'm running a bit against a brick wall since I'm not able to get the training to work properly.

My issue seems to be that the discriminator is not able to adjust when i punish it with the success of the generator. The loss just keeps ramping up until the discriminator classifies everything as valid picture.

Is there a way to retrieve tensors from gradients?
I suspect this might be due to a local minimum that i have to find my way out of, but I would like to have a clearer insight into whats going on.

Answered by VirtualNonsense

Mar 19, 2025

I think i might have resolved at least parts of my issue:
When training two model graphs in parallel one has to be quite particular when to detach a tensor from the model graph.

In my naive approach i detached the tensor while calculating the loss. If I understand it correctly this cause the tensor to be disconnected from the operations before and therefore the optimizer is not able to "reach" this operations in the next optimization step to change the corresponding weights.
Here is what i did it to fix the issue.

View full answer

laggui · 2025-03-19T12:26:21Z

laggui
Mar 19, 2025
Maintainer

Sorry for the delayed response!

Is there a way to retrieve tensors from gradients?

With the GradientsParams, you can retrieve the gradients for a given ParamId (associated to a parameter tensor) with grads.get(id).

2 replies

laggui Mar 19, 2025
Maintainer

Just saw your issue in #2924, so looks like you at least figured this part out 😅

How are you using the GradientParams? You could implement a module mapper similar to how the optimizers are implemented.

In map_float you could inspect the gradients.

VirtualNonsense Mar 19, 2025
Author

Thank you for your answer!

I just answered to your response within that issue!

How are you using the GradientParams? You could implement a module mapper similar to how the optimizers are implemented.
In map_float you could inspect the gradients.

I will have a look!

VirtualNonsense · 2025-03-19T14:13:38Z

VirtualNonsense
Mar 19, 2025
Author

I think i might have resolved at least parts of my issue:
When training two model graphs in parallel one has to be quite particular when to detach a tensor from the model graph.

In my naive approach i detached the tensor while calculating the loss. If I understand it correctly this cause the tensor to be disconnected from the operations before and therefore the optimizer is not able to "reach" this operations in the next optimization step to change the corresponding weights.
Here is what i did it to fix the issue.

1 reply

laggui Mar 19, 2025
Maintainer

Ahh ok, glad you figured it out.

That's correct, tensor.detach() will detach the tensor from the autodiff graph so previous operations are not considered for autodiff.

Gotta be careful about that 😄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discriminator is unable to adapt to high loss. #2921

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Discriminator is unable to adapt to high loss. #2921

VirtualNonsense Mar 18, 2025

Replies: 2 comments · 3 replies

laggui Mar 19, 2025 Maintainer

laggui Mar 19, 2025 Maintainer

VirtualNonsense Mar 19, 2025 Author

VirtualNonsense Mar 19, 2025 Author

laggui Mar 19, 2025 Maintainer

VirtualNonsense
Mar 18, 2025

Replies: 2 comments 3 replies

laggui
Mar 19, 2025
Maintainer

laggui Mar 19, 2025
Maintainer

VirtualNonsense Mar 19, 2025
Author

VirtualNonsense
Mar 19, 2025
Author

laggui Mar 19, 2025
Maintainer