-
Notifications
You must be signed in to change notification settings - Fork 397
A callback for modifying the loss before the optimizer obtains it. #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In general, I see no obstacle to passing For your particular problem, however, I believe that overriding One of our goals with skorch was to make it easy to subclass |
The question is if things like adversarial training can be modularised and reused for different architectures. It would be nice to have a pluggable virtual adversarial training callback but I'm not sure if this is feasible. In general I'm with Benjamin on this, you should resort to using the object framework of python to make such fundamental changes to a net. |
I feel like this kind of thing -- data augmentation or regularization -- shouldn't really need a whole new
Will do. |
There are many ways to augment data or apply regularization. Therefore, there will never be one class that can master them all. We have taken care of many things. For example, weight decay/L1/L2 regularization are already handled quite well. Training time feature augmentation can often handled by the Your particular example is different because it requires the gradient for augmentation. Additionally, GANs typically also require overriding methods in skorch. But there too, there are so many different implementations that it's hard to cover them all. I could, however, imagine that a
This is already possible without too much hassle. For your example above, you need to introduce a new argument on
|
Agreed... but there should be one class that can do most of it. I use Skorch primarily for making my work reproducible, and recordable. Without Skorch, saving a model in a way that it will retrain the same requires saving a whole bunch of things: The NN class itself, the instantiation code for the NN and the optimizer and the criterion, the training loop code, the DataLoader parameters, the datasets, the random seeds and probably a few others that I'm forgetting right now. The benefit that Skorch gives me is that I only need to save the instantiation of the I'm not opposed to a different In this particular case, I think that adding the batch data to the
I could as well, and a GAN seems to be a very valid reason to create a new |
I believe we largely agree on what should and what shouldn't be done. The only missing piece in the puzzle is what use cases are general enough to require a built-in solution. Unfortunately, this kind of data is hard to come by.
Do you want to take this? |
Yea, I will. I'll try and get that done by the end of the day. |
Is there a set of tests somewhere that I should run? |
If you want to help developing, run: git clone https://github.com/dnouri/skorch.git
cd skorch
# create and activate a virtual environment
pip install -r requirements.txt
# install pytorch version for your system (see below)
pip install -r requirements-dev.txt
python setup.py develop
py.test # unit tests
pylint skorch # static code checks (this comes from the README) |
Oops. @taketwo: Thanks for that, I should have read that before asking. |
It mostly did. However, there isn't a good way to run a training step from inside a callback (especially the But maybe those should be in a different issue. |
@benjamin-work In your |
@zachbellay Without knowing anything about your specific case, would it be possible to have the discriminator and the generator be submodules of the same overarching module? Then the module could have those two components as attributes that you can use depending on which of them needs to be trained. In general, we should try to provide a template for GANs in skorch, but since I never use them personally, it's hard for me to do that. Maybe if there's a good pointer to existing pytorch code that could be ported to skorch, we could work on that. |
@BenjaminBossan It would be possible to have them in the same overarching module, although I'm less certain of how to appropriately integrate the two into the single Skorch module. My use case is basically a beefed up version of DCGAN. Here is a good Pytorch implementation that is very similar to my use case. Thanks again for your help! |
Thank you @zachbellay, I'll have a look at this as soon as I've got some time on my hands and see if it's possible to port to skorch without too much gymnastics. |
@benjamin-work that would be very useful for me also. I had some GAN type training to do, and always used some tricks to make it work in skorch. Essentially the problem comes from the fact that for GANs you need 2 models, 2 losses, 2 optimizers, 2 alternating optimization step (minimax game does not converge by joint optimization). Concerning models and loss, this is not an issue as one can combine them in a single module | loss (although this will not save both losses to the history). Optimization is more tricky, indeed skorch basically does the following very naturally : compute output, compute loss, optimization step. Here we need 2x this loop where the second training (discriminator) depends on the output of the first loop. So even if we assume that we used the same optimizer / scheduling for both models (I don't think people do, but this would require many more changes and parameter groups can already do quite a lot), you would still have to either:
Taking from the link posted by @zachbellay, here's essentially the minimum to achieve :
Here are the proposition outlines (not working code) Proposition 1 :
This is very easy to do in skorch , please double check that the gradients are actually correct (what is backpropagated where), but I think it is. The major issue here is flexibility. Indeed the theory says (if I'm not mistaken) to do 1) multiple optimization steps of the discriminator, 2) update the discriminator using the latest generator. None of those are done in the given link (i'm not sure about SOTA GAN), but both would be basically impossible in a single output-loss-optim loop. Proposition 2 :
Here the method is theoretically sound and flexible, but not as clean (basically using some flags to say when the loss or model should be in discrim or generator mode). Note that step_accumulator also has to be modified. Let me know what you think and @zachbellay if that would work for GANs as I don't have any experience with those. |
@YannDubs Thank you for the proposal and clean explanation.
And I wouldn't be surprised if there were applications where even that is not enough. At the end of the day, I wonder how much sense it makes to "contort" skorch to make it work this way. On the one hand, I'm flattered that people try to use it for even more unconventional cases, on the other it might just not be the best tool (at the moment). Basically, at the moment, the user would need to implement their own The main challenges I foresee are the logging/callbacks and the As I said in the other thread, when I have time, I'll try my hands again on the topic. Any kind of feedback, hints, existing repos with concrete implementations, are appreciated. |
There is a class of things that don't appear to be able to be possible within the current callback framework of Skorch.
The first one that comes to mind is in adversarial training. In order to do efficiently, you would want to run all samples through the network once, create new samples by adding some noise that is dependent on the gradient to the old samples, then run those through again:
Unfortunately, this isn't actually possible using the current framework (without creating a new
NeuralNet
class).This could probably be fixed if
Xi
andyi
were passed to theon_grad_computed
callback, but it isn't currently.The text was updated successfully, but these errors were encountered: