Skip to content

history: Fix regressions in __getitem__ #776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 26, 2021

Conversation

cebtenzzre
Copy link
Contributor

@cebtenzzre cebtenzzre commented May 21, 2021

  • Index epochs first so IndexError is always raised for bad epoch
    indices, and so jagged batch counts don't raise a false IndexError.

  • Only filter epochs with non-matching keys once, and do it after
    indexing epochs to avoid returning values from the wrong epoch(s).

  • Never compare arbitrary values to empty lists or tuples - instead,
    generate _none if no batches matched the keys to exclude the epoch.

  • Don't raise KeyError if all batches are requested but there are no
    epochs - an empty list is probably more useful.


There are 8 new tests - 13 if you count all the parameterized variants.

Command used to run the new regression tests:

pytest --no-header --disable-warnings -c /dev/null skorch/tests/test_history_newtests.py

Tests included in test_history_newtests.py: test_history_key_in_other_epoch, test_history_no_epochs_index, test_history_jagged_batches

Test Results

Selected tests before commit c50f3bb:

============================= test session starts ==============================
collected 3 items                                                              

skorch/tests/test_history_newtests.py ...                                [100%]

======================= 3 passed, 3220 warnings in 0.01s =======================

Selected tests on commit c50f3bb (PR #312):

============================= test session starts ==============================
collected 3 items                                                              

skorch/tests/test_history_newtests.py FFF                                [100%]

=================================== FAILURES ===================================
_________________ TestHistory.test_history_key_in_other_epoch __________________

self = <skorch.tests.test_history_newtests.TestHistory object at 0x7f95b3572460>

    def test_history_key_in_other_epoch(self):
        h = History()
        for has_valid in (True, False):
            h.new_epoch()
            h.new_batch()
            h.record_batch('train_loss', 1)
            if has_valid:
                h.new_batch()
                h.record_batch('valid_loss', 2)
    
        with pytest.raises(KeyError):
            # pylint: disable=pointless-statement
>           h[-1, 'batches', -1, 'valid_loss']
E           Failed: DID NOT RAISE <class 'KeyError'>

skorch/tests/test_history_newtests.py:43: Failed
___________________ TestHistory.test_history_no_epochs_index ___________________

self = <skorch.tests.test_history_newtests.TestHistory object at 0x7f95b354e580>

    def test_history_no_epochs_index(self):
        h = History()
        with pytest.raises(IndexError):
            # pylint: disable=pointless-statement
>           h[-1, 'batches']

skorch/tests/test_history_newtests.py:49: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = [], i = (-1, 'batches')

    def __getitem__(self, i):
        # This implementation resolves indexing backwards,
        # i.e. starting from the batches, then progressing to the
        # epochs.
        if isinstance(i, (int, slice)):
            i = (i,)
    
        # i_e: index epoch, k_e: key epoch
        # i_b: index batch, k_b: key batch
        i_e, k_e, i_b, k_b = _unpack_index(i)
        keyerror_msg = "Key '{}' was not found in history."
    
        if i_b is not None and k_e != 'batches':
            raise KeyError("History indexing beyond the 2nd level is "
                           "only possible if key 'batches' is used, "
                           "found key '{}'.".format(k_e))
    
        items = self.to_list()
    
        # extract indices of batches
        # handles: history[..., k_e, i_b]
        if i_b is not None:
            items = [row[k_e][i_b] for row in items]
    
        # extract keys of batches
        # handles: history[..., k_e, i_b][k_b]
        if k_b is not None:
            items = [
                _filter_none([_getitem(b, k_b) for b in batches])
                if isinstance(batches, (list, tuple))
                else _getitem(batches, k_b)
                for batches in items
            ]
            # get rid of empty batches
            items = [b for b in items if b not in (_none, [], ())]
            if not _filter_none(items):
                # all rows contained _none or were empty
                raise KeyError(keyerror_msg.format(k_b))
    
        # extract epoch-level values, but only if not already done
        # handles: history[..., k_e]
        if (k_e is not None) and (i_b is None):
            items = [_getitem(batches, k_e)
                     for batches in items]
            if not _filter_none(items):
>               raise KeyError(keyerror_msg.format(k_e))
E               KeyError: "Key 'batches' was not found in history."

skorch/history.py:210: KeyError
___________________ TestHistory.test_history_jagged_batches ____________________

self = <skorch.tests.test_history_newtests.TestHistory object at 0x7f95b34f3af0>

    def test_history_jagged_batches(self):
        h = History()
        for num_batch in (1, 2):
            h.new_epoch()
            for _ in range(num_batch):
                h.new_batch()
        # Make sure we can access this batch
>       assert h[-1, 'batches', 1] == {}

skorch/tests/test_history_newtests.py:58: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
skorch/history.py:187: in __getitem__
    items = [row[k_e][i_b] for row in items]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <list_iterator object at 0x7f95b34f3820>

>   items = [row[k_e][i_b] for row in items]
E   IndexError: list index out of range

skorch/history.py:187: IndexError
=========================== short test summary info ============================
FAILED skorch/tests/test_history_newtests.py::TestHistory::test_history_key_in_other_epoch
FAILED skorch/tests/test_history_newtests.py::TestHistory::test_history_no_epochs_index
FAILED skorch/tests/test_history_newtests.py::TestHistory::test_history_jagged_batches
======================= 3 failed, 3223 warnings in 0.12s =======================

Selected tests on master (commit 1d37ef3):

============================= test session starts ==============================
collected 3 items                                                              

skorch/tests/test_history_newtests.py FFF                                [100%]

=================================== FAILURES ===================================
[snip]

Selected tests on this PR:

============================= test session starts ==============================
collected 3 items                                                              

skorch/tests/test_history_newtests.py ...                                [100%]

======================= 3 passed, 3219 warnings in 0.01s =======================

@cebtenzzre
Copy link
Contributor Author

cebtenzzre commented May 22, 2021

I was able to reproduce the CI failure locally on master (1673a89) with wandb 0.10.30 installed, but not with 0.10.29.

Edit: Did a git-bisect, the commit that broke CI is wandb/wandb@9da2bea. I would open a wandb issue but I don't know much about hooks or wandb. Minimal pytest output:

================================================= test session starts ==================================================
platform linux -- Python 3.9.4, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/cebtenzzre/src/forks/skorch/skorch/tests/callbacks, configfile: ../../../../../../../../dev/null
plugins: flaky-3.7.0, cov-2.12.0
collected 55 items                                                                                                     

skorch/tests/callbacks/test_logging.py sssssssssssss..F........................Fssssssssssssss                   [100%]

======================================================= FAILURES =======================================================
_______________________________________ TestWandb.test_fit_with_real_experiment ________________________________________
Traceback (most recent call last):
  File "/home/cebtenzzre/src/forks/skorch/skorch/tests/callbacks/test_logging.py", line 422, in test_fit_with_real_experiment
    net.fit(*data)
  File "/home/cebtenzzre/src/forks/skorch/skorch/classifier.py", line 142, in fit
    return super(NeuralNetClassifier, self).fit(X, y, **fit_params)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 953, in fit
    self.partial_fit(X, y, **fit_params)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 912, in partial_fit
    self.fit_loop(X, y, **fit_params)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 825, in fit_loop
    self.run_single_epoch(dataset_train, training=True, prefix="train",
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 860, in run_single_epoch
    step = step_fn(batch, **fit_params)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 745, in train_step
    self.optimizer_.step(step_fn)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/optim/optimizer.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/optim/sgd.py", line 87, in step
    loss = closure()
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 735, in step_fn
    step = self.train_step_single(batch, **fit_params)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 679, in train_step_single
    y_pred = self.infer(Xi, **fit_params)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 1097, in infer
    return self.module_(x, **fit_params)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cebtenzzre/src/forks/skorch/skorch/toy.py", line 88, in forward
    X = self.sequential(X)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 890, in _call_impl
    for hook in itertools.chain(
RuntimeError: OrderedDict mutated during iteration
------------------------------------------------ Captured stderr setup -------------------------------------------------
2021-05-21 23:36:03.323066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.summary API due to missing TensorBoard installation.
wandb: W&B syncing is set to `offline` in this directory.  Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
_____________________________________________ TestProgressBar.test_pickle ______________________________________________
Traceback (most recent call last):
  File "/home/cebtenzzre/src/forks/skorch/skorch/tests/callbacks/test_logging.py", line 703, in test_pickle
    dump = pickle.dumps(net)
  File "/home/cebtenzzre/src/forks/skorch/skorch/net.py", line 1739, in __getstate__
    torch.save(cuda_attrs, f)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/serialization.py", line 372, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/cebtenzzre/.local/lib/python3.9/site-packages/torch/serialization.py", line 476, in _save
    pickler.dump(obj)
AttributeError: Can't pickle local object 'TorchGraph.create_forward_hook.<locals>.after_forward_hook'
------------------------------------------------- Captured stdout call -------------------------------------------------
  epoch    train_loss     dur
-------  ------------  ------
      1        0.2821  0.0011
      2        0.2602  0.0010
------------------------------------------------- Captured stderr call -------------------------------------------------
                                     
=============================================== short test summary info ================================================
FAILED skorch/tests/callbacks/test_logging.py::TestWandb::test_fit_with_real_experiment - RuntimeError: OrderedDict m...
FAILED skorch/tests/callbacks/test_logging.py::TestProgressBar::test_pickle - AttributeError: Can't pickle local obje...
=============================== 2 failed, 26 passed, 27 skipped, 3233 warnings in 1.89s ================================

@BenjaminBossan
Copy link
Collaborator

I was able to reproduce the CI failure locally on master (1673a89) with wandb 0.10.30 installed, but not with 0.10.29.

Yes, this has nothing to do with your changes, #772 should have fixed the error.

These changes are aimed at partially restoring the behavior of History
prior to commit c50f3bb (PR #312).

Thank you for this very thorough PR, especially for adding a bunch of tests. I will take a closer look, since this touches a couple of skorch internals.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you a lot for this PR. You put a lot of work into this and made good changes.

It's almost good to go. There are a few test cases that I need some help understanding, could you please look at my comment?

Also, could you please rebase on/merge with master, so that the CI can run successfully? Finally, please add an entry to the CHANGES.md that a few bugs in history were fixed.

* Index epochs first so IndexError is always raised for bad epoch
  indices, and so jagged batch counts don't raise a false IndexError.

* Only filter epochs with non-matching keys once, and do it after
  indexing epochs to avoid returning values from the wrong epoch(s).

* Never compare arbitrary values to empty lists or tuples - instead,
  generate _none if no batches matched the keys to exclude the epoch.

* Don't raise KeyError if all batches are requested but there are no
  epochs - an empty list is probably more useful.
Add a few tests to verify the current behavior of History.
test_history_key_in_other_epoch, test_history_no_epochs_index, and
test_history_jagged_batches passed prior to commit
c50f3bb (PR skorch-dev#312).
@cebtenzzre cebtenzzre force-pushed the fix-history-getitem branch from 0f92f90 to cf4f387 Compare May 25, 2021 21:57
@BenjaminBossan
Copy link
Collaborator

This is a truly meticulous PR, thank you so much for this. In addition to fixing these bugs/regressions, it even makes history a bit faster (at least when I tested with examples/benchmarks/history.py) 👍.

Really well done.

@BenjaminBossan BenjaminBossan merged commit 54796f1 into skorch-dev:master May 26, 2021
BenjaminBossan added a commit that referenced this pull request Oct 31, 2021
We are happy to announce the new skorch 0.11 release:

Two basic but very useful features have been added to our collection of callbacks. First, by setting `load_best=True` on the  [`Checkpoint` callback](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.Checkpoint), the snapshot of the network with the best score will be loaded automatically when training ends. Second, we added a callback [`InputShapeSetter`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.InputShapeSetter) that automatically adjusts your input layer to have the size of your input data (useful e.g. when that size is not known beforehand).

When it comes to integrations, the [`MlflowLogger`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.MlflowLogger) now allows to automatically log to [MLflow](https://mlflow.org/). Thanks to a contributor, some regressions in `net.history` have been fixed and it even runs faster now.

On top of that, skorch now offers a new module, `skorch.probabilistic`. It contains new classes to work with **Gaussian Processes** using the familiar skorch API. This is made possible by the fantastic [GPyTorch](https://github.com/cornellius-gp/gpytorch) library, which skorch uses for this. So if you want to get started with Gaussian Processes in skorch, check out the [documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) and this [notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). Since we're still learning, it's possible that we will change the API in the future, so please be aware of that.

Morever, we introduced some changes to make skorch more customizable. First of all, we changed the signature of some methods so that they no longer assume the dataset to always return exactly 2 values. This way, it's easier to work with custom datasets that return e.g. 3 values. Normal users should not notice any difference, but if you often create custom nets, take a look at the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11).

And finally, we made a change to how custom modules, criteria, and optimizers are handled. They are now "first class citizens" in skorch land, which means: If you add a second module to your custom net, it is treated exactly the same as the normal module. E.g., skorch takes care of moving it to CUDA if needed and of switching it to train or eval mode. This way, customizing your networks architectures with skorch is easier than ever. Check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details.

Since these are some big changes, it's possible that you encounter issues. If that's the case, please check our [issue](https://github.com/skorch-dev/skorch/issues) page or create a new one.

As always, this release was made possible by outside contributors. Many thanks to:

- Autumnii
- Cebtenzzre
- Charles Cabergs
- Immanuel Bayer
- Jake Gardner
- Matthias Pfenninger
- Prabhat Kumar Sahu 

Find below the list of all changes:

Added

- Added `load_best` attribute to `Checkpoint` callback to automatically load state of the best result at the end of training
- Added a `get_all_learnable_params` method to retrieve the named parameters of all PyTorch modules defined on the net, including of criteria if applicable
- Added `MlflowLogger` callback for logging to Mlflow (#769)
- Added `InputShapeSetter` callback for automatically setting the input dimension of the PyTorch module
- Added a new module to support Gaussian Processes through [GPyTorch](https://gpytorch.ai/). To learn more about it, read the [GP documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) or take a look at the [GP notebook](https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). This feature is experimental, i.e. the API could be changed in the future in a backwards incompatible way (#782)

Changed

- Changed the signature of `validation_step`, `train_step_single`, `train_step`, `evaluation_step`, `on_batch_begin`, and `on_batch_end` such that instead of receiving `X` and `y`, they receive the whole batch; this makes it easier to deal with datasets that don't strictly return an `(X, y)` tuple, which is true for quite a few PyTorch datasets; please refer to the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11) if you encounter problems (#699)
- Checking of arguments to `NeuralNet` is now during `.initialize()`, not during `__init__`, to avoid raising false positives for yet unknown module or optimizer attributes
- Modules, criteria, and optimizers that are added to a net by the user are now first class: skorch takes care of setting train/eval mode, moving to the indicated device, and updating all learnable parameters during training (check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details, #751)
- `CVSplit` is renamed to `ValidSplit` to avoid confusion (#752)

Fixed

- Fixed a few bugs in the `net.history` implementation (#776)
- Fixed a bug in `TrainEndCheckpoint` that prevented it from being unpickled (#773)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants