-
Notifications
You must be signed in to change notification settings - Fork 397
history: Fix regressions in __getitem__ #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
history: Fix regressions in __getitem__ #776
Conversation
I was able to reproduce the CI failure locally on master (1673a89) with wandb 0.10.30 installed, but not with 0.10.29. Edit: Did a git-bisect, the commit that broke CI is wandb/wandb@9da2bea. I would open a wandb issue but I don't know much about hooks or wandb. Minimal pytest output:
|
Yes, this has nothing to do with your changes, #772 should have fixed the error.
Thank you for this very thorough PR, especially for adding a bunch of tests. I will take a closer look, since this touches a couple of skorch internals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you a lot for this PR. You put a lot of work into this and made good changes.
It's almost good to go. There are a few test cases that I need some help understanding, could you please look at my comment?
Also, could you please rebase on/merge with master, so that the CI can run successfully? Finally, please add an entry to the CHANGES.md that a few bugs in history were fixed.
* Index epochs first so IndexError is always raised for bad epoch indices, and so jagged batch counts don't raise a false IndexError. * Only filter epochs with non-matching keys once, and do it after indexing epochs to avoid returning values from the wrong epoch(s). * Never compare arbitrary values to empty lists or tuples - instead, generate _none if no batches matched the keys to exclude the epoch. * Don't raise KeyError if all batches are requested but there are no epochs - an empty list is probably more useful.
Add a few tests to verify the current behavior of History. test_history_key_in_other_epoch, test_history_no_epochs_index, and test_history_jagged_batches passed prior to commit c50f3bb (PR skorch-dev#312).
0f92f90
to
cf4f387
Compare
This is a truly meticulous PR, thank you so much for this. In addition to fixing these bugs/regressions, it even makes Really well done. |
We are happy to announce the new skorch 0.11 release: Two basic but very useful features have been added to our collection of callbacks. First, by setting `load_best=True` on the [`Checkpoint` callback](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.Checkpoint), the snapshot of the network with the best score will be loaded automatically when training ends. Second, we added a callback [`InputShapeSetter`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.InputShapeSetter) that automatically adjusts your input layer to have the size of your input data (useful e.g. when that size is not known beforehand). When it comes to integrations, the [`MlflowLogger`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.MlflowLogger) now allows to automatically log to [MLflow](https://mlflow.org/). Thanks to a contributor, some regressions in `net.history` have been fixed and it even runs faster now. On top of that, skorch now offers a new module, `skorch.probabilistic`. It contains new classes to work with **Gaussian Processes** using the familiar skorch API. This is made possible by the fantastic [GPyTorch](https://github.com/cornellius-gp/gpytorch) library, which skorch uses for this. So if you want to get started with Gaussian Processes in skorch, check out the [documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) and this [notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). Since we're still learning, it's possible that we will change the API in the future, so please be aware of that. Morever, we introduced some changes to make skorch more customizable. First of all, we changed the signature of some methods so that they no longer assume the dataset to always return exactly 2 values. This way, it's easier to work with custom datasets that return e.g. 3 values. Normal users should not notice any difference, but if you often create custom nets, take a look at the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11). And finally, we made a change to how custom modules, criteria, and optimizers are handled. They are now "first class citizens" in skorch land, which means: If you add a second module to your custom net, it is treated exactly the same as the normal module. E.g., skorch takes care of moving it to CUDA if needed and of switching it to train or eval mode. This way, customizing your networks architectures with skorch is easier than ever. Check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details. Since these are some big changes, it's possible that you encounter issues. If that's the case, please check our [issue](https://github.com/skorch-dev/skorch/issues) page or create a new one. As always, this release was made possible by outside contributors. Many thanks to: - Autumnii - Cebtenzzre - Charles Cabergs - Immanuel Bayer - Jake Gardner - Matthias Pfenninger - Prabhat Kumar Sahu Find below the list of all changes: Added - Added `load_best` attribute to `Checkpoint` callback to automatically load state of the best result at the end of training - Added a `get_all_learnable_params` method to retrieve the named parameters of all PyTorch modules defined on the net, including of criteria if applicable - Added `MlflowLogger` callback for logging to Mlflow (#769) - Added `InputShapeSetter` callback for automatically setting the input dimension of the PyTorch module - Added a new module to support Gaussian Processes through [GPyTorch](https://gpytorch.ai/). To learn more about it, read the [GP documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) or take a look at the [GP notebook](https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). This feature is experimental, i.e. the API could be changed in the future in a backwards incompatible way (#782) Changed - Changed the signature of `validation_step`, `train_step_single`, `train_step`, `evaluation_step`, `on_batch_begin`, and `on_batch_end` such that instead of receiving `X` and `y`, they receive the whole batch; this makes it easier to deal with datasets that don't strictly return an `(X, y)` tuple, which is true for quite a few PyTorch datasets; please refer to the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11) if you encounter problems (#699) - Checking of arguments to `NeuralNet` is now during `.initialize()`, not during `__init__`, to avoid raising false positives for yet unknown module or optimizer attributes - Modules, criteria, and optimizers that are added to a net by the user are now first class: skorch takes care of setting train/eval mode, moving to the indicated device, and updating all learnable parameters during training (check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details, #751) - `CVSplit` is renamed to `ValidSplit` to avoid confusion (#752) Fixed - Fixed a few bugs in the `net.history` implementation (#776) - Fixed a bug in `TrainEndCheckpoint` that prevented it from being unpickled (#773)
Index epochs first so IndexError is always raised for bad epoch
indices, and so jagged batch counts don't raise a false IndexError.
Only filter epochs with non-matching keys once, and do it after
indexing epochs to avoid returning values from the wrong epoch(s).
Never compare arbitrary values to empty lists or tuples - instead,
generate _none if no batches matched the keys to exclude the epoch.
Don't raise KeyError if all batches are requested but there are no
epochs - an empty list is probably more useful.
There are 8 new tests - 13 if you count all the parameterized variants.
Command used to run the new regression tests:
Tests included in test_history_newtests.py: test_history_key_in_other_epoch, test_history_no_epochs_index, test_history_jagged_batches
Test Results
Selected tests before commit c50f3bb:
Selected tests on commit c50f3bb (PR #312):
Selected tests on master (commit 1d37ef3):
Selected tests on this PR: