Skip to content

[RFC] Add argument to specify output nonlinearity #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BenjaminBossan opened this issue Jun 24, 2020 · 2 comments · Fixed by #662
Closed

[RFC] Add argument to specify output nonlinearity #661

BenjaminBossan opened this issue Jun 24, 2020 · 2 comments · Fixed by #662

Comments

@BenjaminBossan
Copy link
Collaborator

Currently, the nonlinearity applied to the module output before returning it from predict_proba is not changeable.

For NeuralNetClassifier, no nonlinearity is applied, which means that if the module doesn't return probabilities (which might be desired, e.g. for numerical stability), predict_proba also doesn't return probabilities.

For NeuralNetBinaryClassifier, sigmoid is applied if the loss is an instance of torch.nn.BCEWithLogitsLoss. Otherwise, it's untouched.

For NeuralNet and NeuralNetRegressor, no nonlinearity is applied, which is typically what is desired (we still provide predict_proba, even though it's just the same as predict, but it might still be useful for users to have two methods).

The proposal is to add an argument to NeuralNet called output_nonlinearity (or similar) that is applied to the module's result. This makes it very easy for the user to change the output nonlinearity.

Where is the right place to apply it? I would argue on a batch level inside forward_iter, before calling to_device. There are three main reasons:

  1. Use GPU acceleration if so desired
  2. Use pytorch functions instead of numpy -- it's more likely that pytorch provides an appropriate function
  3. Not only affects predict_proba, but also forward and forward_iter, which should be preferable for most

Regarding the implementation, it should be backwards compatibly, i.e. the solution should do exactly the same as is being done at the moment. To achieve this, the default output_nonlinearity should be 'auto'. In that case, a special function should be called that infers the nonlinearity from the net instance and reproduces the existing behavior. We could think about extending the inferred nonlinearities compared to what is done right now, e.g. to apply softmax is the loss is CrossEntropyLoss.

(An interesting design decision would be if the auto function "dispatches" only on the criterion or also on the net. E.g., if the criterion is BCEWithLogitsLoss, is sigmoid always applied, or only in case of NeuralNetBinaryClassifier? Regardless of the answer, I would like to have the auto function take the whole net as argument, not only the the criterion, to infer the nonlinearity, even if we decide to only use the criterion for now, in case we need/want to extend it further.)

Other options for output_nonlinearity should be callables, like torch.softmax, or None, in which case the output is untouched.

This proposal supersedes #572 and #580 (which seem to be abandoned anyway).

Note that this change might break existing code, where methods like predict_proba are overridden and make the assumption that the outputs from forward_iter are untouched. Those users should set output_nonlinearity=None to make their code work again. I think this is an acceptable tradeoff (document this in the CHANGES.md).

@thomasjpfan
Copy link
Member

I agree with adding output_nonlinearity='auto', because it is more explicit. Would be the other accepted values for output_nonlinearity by pytorch functions or any callable?

An interesting design decision would be if the auto function "dispatches" only on the criterion or also on the net

Since we are currently dispatch based on criterion, we can continue doing that for now. I am assuming that the auto function will be private that takes the whole net.

@BenjaminBossan
Copy link
Collaborator Author

Would be the other accepted values for output_nonlinearity by pytorch functions or any callable?

I think we could accept any callable, it would be hard to check (or even define) what a "pytorch function" would be.

Since we are currently dispatch based on criterion, we can continue doing that for now.

I agree, we can always move to more fine grained dispatching later.

I am assuming that the auto function will be private that takes the whole net.

Yes, I think it should take the net as input.

As to how the auto function is implemented, I'm still not quite sure. E.g. it could make use of a "dispatching table" (which at the moment doesn't differ for the net class, only for the criterion) with the possibility for the user to enhance it by just modifying it. But maybe that's overengineered for this problem :D

BenjaminBossan added a commit that referenced this issue Aug 30, 2020
This release of skorch contains a few minor improvements and some nice additions. As always, we fixed a few bugs and improved the documentation. Our [learning rate scheduler](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.LRScheduler) now optionally logs learning rate changes to the history; moreover, it now allows the user to choose whether an update step should be made after each batch or each epoch.

If you always longed for a metric that would just use whatever is defined by your criterion, look no further than [`loss_scoring`](https://skorch.readthedocs.io/en/latest/scoring.html#skorch.scoring.loss_scoring). Also, skorch now allows you to easily change the kind of nonlinearity to apply to the module's output when `predict` and `predict_proba` are called, by passing the `predict_nonlinearity` argument.

Besides these changes, we improved the customization potential of skorch. First of all, the `criterion` is now set to `train` or `valid`, depending on the phase -- this is useful if the criterion should act differently during training and validation. Next we made it easier to add custom modules, optimizers, and criteria to your neural net; this should facilitate implementing architectures like GANs. Consult the [docs](https://skorch.readthedocs.io/en/latest/user/neuralnet.html#subclassing-neuralnet) for more on this. Conveniently, [`net.save_params`](https://skorch.readthedocs.io/en/latest/net.html#skorch.net.NeuralNet.save_params) can now persist arbitrary attributes, including those custom modules.
As always, these improvements wouldn't have been possible without the community. Please keep asking questions, raising issues, and proposing new features. We are especially grateful to those community members, old and new, who contributed via PRs:

```
Aaron Berk
guybuk
kqf
Michał Słapek
Scott Sievert
Yann Dubois
Zhao Meng
```

Here is the full list of all changes:

### Added

- Added the `event_name` argument for `LRScheduler` for optional recording of LR changes inside `net.history`. NOTE: Supported only in Pytorch>=1.4
- Make it easier to add custom modules or optimizers to a neural net class by automatically registering them where necessary and by making them available to set_params
- Added the `step_every` argument for `LRScheduler` to set whether the scheduler step should be taken on every epoch or on every batch.
- Added the `scoring` module with `loss_scoring` function, which computes the net's loss (using `get_loss`) on provided input data.
- Added a parameter `predict_nonlinearity` to `NeuralNet` which allows users to control the nonlinearity to be applied to the module output when calling `predict` and `predict_proba` (#637, #661)
- Added the possibility to save the criterion with `save_params` and with checkpoint callbacks
- Added the possibility to save custom modules with `save_params` and with checkpoint callbacks

### Changed

- Removed support for schedulers with a `batch_step()` method in `LRScheduler`.
- Raise `FutureWarning` in `CVSplit` when `random_state` is not used. Will raise an exception in a future (#620)
- The behavior of method `net.get_params` changed to make it more consistent with sklearn: it will no longer return "learned" attributes like `module_`; therefore, functions like `sklearn.base.clone`, when called with a fitted net, will no longer return a fitted net but instead an uninitialized net; if you want a copy of a fitted net, use `copy.deepcopy` instead;`net.get_params` is used under the hood by many sklearn functions and classes, such as `GridSearchCV`, whose behavior may thus be affected by the change. (#521, #527)
- Raise `FutureWarning` when using `CyclicLR` scheduler, because the default behavior has changed from taking a step every batch to taking a step every epoch. (#626)
- Set train/validation on criterion if it's a PyTorch module (#621)
- Don't pass `y=None` to `NeuralNet.train_split` to enable the direct use of split functions without positional `y` in their signatures. This is useful when working with unsupervised data (#605).
- `to_numpy` is now able to unpack dicts and lists/tuples (#657, #658)
- When using `CrossEntropyLoss`, softmax is now automatically applied to the output when calling `predict` or `predict_proba`

### Fixed

- Fixed a bug where `CyclicLR` scheduler would update during both training and validation rather than just during training.
- Fixed a bug introduced by moving the `optimizer.zero_grad()` call outside of the train step function, making it incompatible with LBFGS and other optimizers that call the train step several times per batch (#636)
- Fixed pickling of the `ProgressBar` callback (#656)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants