Skip to content

Add option to apply torch.softmax or disable torch.log internally #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ToddMorrill opened this issue May 15, 2020 · 2 comments · Fixed by #662
Closed

Add option to apply torch.softmax or disable torch.log internally #637

ToddMorrill opened this issue May 15, 2020 · 2 comments · Fixed by #662

Comments

@ToddMorrill
Copy link

From what I've gathered, there are basically 3 options for losses for classification tasks when using Skorch.

  1. Have your network output probabilities from a torch.softmax layer and then use nn.NLLLoss, where Skorch is applying orch.log internally. But this has been shown to be numerically unstable. Overview, Details
  • No solution: based on my understanding of the links about, I would advise against doing this.
  1. Have your network output from nn.logsoftmax and then use nn.NLLLoss, which is numerically stable. Currently Skorch is applying torch.log to the output of the network when nn.NLLLoss is specified. So torch.log is applied twice and the model doesn't train.
  • Solution: have an option to disable the application of torch.log internally before nn.NLLLoss.
  1. Have your network output raw logits (i.e. don't apply any sort of softmax) and then use nn.CrossEntropyLoss, which is numerically stable. Skorch will handle training but then model.predict_proba breaks down because you're getting raw logits. Obviously I can just apply the softmax function but it breaks the semantics of the sklearn API.
  • Solution: have an option to apply of torch.softmax internally before model.predict_proba.
@BenjaminBossan
Copy link
Collaborator

Hi Todd,

Your assessment of 1 is correct, though just from my personal experience, I haven't found it ever to matter (maybe I was just lucky).

Your assessment of 2 is a bit incomplete. If the module returns log softmax, the result from predict_proba wouldn't be probabilities, so you'd need to change that as well.

Your assessment of 3 is correct.

Regarding solutions, instead of having complicated options of what transformation to apply when, we wanted to go with a solution to make it easy to implement your own transformation (leaving the defaults as they are). This is the PR. Unfortunately, it's been asleep for some time, hopefully it will be finished eventually.

@ToddMorrill
Copy link
Author

Alright, I'll stay tuned. Thanks for the update.

BenjaminBossan added a commit that referenced this issue Aug 30, 2020
This release of skorch contains a few minor improvements and some nice additions. As always, we fixed a few bugs and improved the documentation. Our [learning rate scheduler](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.LRScheduler) now optionally logs learning rate changes to the history; moreover, it now allows the user to choose whether an update step should be made after each batch or each epoch.

If you always longed for a metric that would just use whatever is defined by your criterion, look no further than [`loss_scoring`](https://skorch.readthedocs.io/en/latest/scoring.html#skorch.scoring.loss_scoring). Also, skorch now allows you to easily change the kind of nonlinearity to apply to the module's output when `predict` and `predict_proba` are called, by passing the `predict_nonlinearity` argument.

Besides these changes, we improved the customization potential of skorch. First of all, the `criterion` is now set to `train` or `valid`, depending on the phase -- this is useful if the criterion should act differently during training and validation. Next we made it easier to add custom modules, optimizers, and criteria to your neural net; this should facilitate implementing architectures like GANs. Consult the [docs](https://skorch.readthedocs.io/en/latest/user/neuralnet.html#subclassing-neuralnet) for more on this. Conveniently, [`net.save_params`](https://skorch.readthedocs.io/en/latest/net.html#skorch.net.NeuralNet.save_params) can now persist arbitrary attributes, including those custom modules.
As always, these improvements wouldn't have been possible without the community. Please keep asking questions, raising issues, and proposing new features. We are especially grateful to those community members, old and new, who contributed via PRs:

```
Aaron Berk
guybuk
kqf
Michał Słapek
Scott Sievert
Yann Dubois
Zhao Meng
```

Here is the full list of all changes:

### Added

- Added the `event_name` argument for `LRScheduler` for optional recording of LR changes inside `net.history`. NOTE: Supported only in Pytorch>=1.4
- Make it easier to add custom modules or optimizers to a neural net class by automatically registering them where necessary and by making them available to set_params
- Added the `step_every` argument for `LRScheduler` to set whether the scheduler step should be taken on every epoch or on every batch.
- Added the `scoring` module with `loss_scoring` function, which computes the net's loss (using `get_loss`) on provided input data.
- Added a parameter `predict_nonlinearity` to `NeuralNet` which allows users to control the nonlinearity to be applied to the module output when calling `predict` and `predict_proba` (#637, #661)
- Added the possibility to save the criterion with `save_params` and with checkpoint callbacks
- Added the possibility to save custom modules with `save_params` and with checkpoint callbacks

### Changed

- Removed support for schedulers with a `batch_step()` method in `LRScheduler`.
- Raise `FutureWarning` in `CVSplit` when `random_state` is not used. Will raise an exception in a future (#620)
- The behavior of method `net.get_params` changed to make it more consistent with sklearn: it will no longer return "learned" attributes like `module_`; therefore, functions like `sklearn.base.clone`, when called with a fitted net, will no longer return a fitted net but instead an uninitialized net; if you want a copy of a fitted net, use `copy.deepcopy` instead;`net.get_params` is used under the hood by many sklearn functions and classes, such as `GridSearchCV`, whose behavior may thus be affected by the change. (#521, #527)
- Raise `FutureWarning` when using `CyclicLR` scheduler, because the default behavior has changed from taking a step every batch to taking a step every epoch. (#626)
- Set train/validation on criterion if it's a PyTorch module (#621)
- Don't pass `y=None` to `NeuralNet.train_split` to enable the direct use of split functions without positional `y` in their signatures. This is useful when working with unsupervised data (#605).
- `to_numpy` is now able to unpack dicts and lists/tuples (#657, #658)
- When using `CrossEntropyLoss`, softmax is now automatically applied to the output when calling `predict` or `predict_proba`

### Fixed

- Fixed a bug where `CyclicLR` scheduler would update during both training and validation rather than just during training.
- Fixed a bug introduced by moving the `optimizer.zero_grad()` call outside of the train step function, making it incompatible with LBFGS and other optimizers that call the train step several times per batch (#636)
- Fixed pickling of the `ProgressBar` callback (#656)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants