-
Notifications
You must be signed in to change notification settings - Fork 398
ENH: Adds TeeGenerator for named_parameters #379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Adds TeeGenerator for named_parameters #379
Conversation
I have some comments on this:
|
skorch/utils.py
Outdated
|
||
|
||
class LazyGenerator: | ||
"""Stores a generator and calls list on the generator on the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Stores a generator and calls list on the generator on the first | |
"""Stores a function that produces a generator and calls list on the generator on the first |
Well, both actually. The function is evaluated lazily and the generator is cached (and/or multiplexed).
You need to always copy the generator: it, *_ = tee(self.gen_, 1) will fail if the generator is evaluated more than twice. Better: from itertools import tee
class LazyGenerator_:
def __init__(self, gen):
self.gen = gen()
def __iter__(self):
self.gen, it = tee(self.gen)
yield from it (this way you don't need the callback at all) |
What do you mean by that?
Yes, this also works. I wanted to avoid calling the generator during |
Sorry, should have written an example: g = BetterName(lambda: [1,2,3])
g1 = list(g) # == [1,2,3]
g2 = list(g) # == [1,2,3]
g3 = list(g) # == [] <- bad
I think it is no requirement. While there might be an overhead by simply creating the generator yielded by |
Interesting, I didn't see this. If initialization of the generator during
|
Here are the results of the microbenchmark comparing the from itertools import tee
class ListLazyGenerator:
def __init__(self, gen):
self.gen = gen
self.gen_ = None
def __iter__(self):
if self.gen_ is None:
self.gen_ = list(self.gen())
yield from self.gen_
class TeeLazyGenerator:
def __init__(self, gen):
self.gen = gen
self.gen_ = None
def __iter__(self):
if self.gen_ is None:
self.gen_ = self.gen()
self.gen_, it = tee(self.gen_)
yield from it
class TeeCallInInitLazyGenerator:
def __init__(self, gen):
self.gen = gen()
def __iter__(self):
self.gen, it = tee(self.gen)
yield from it Small Moduleimport torchvision
m = torchvision.models.squeezenet1_0()
# No generator
%timeit list(m.named_parameters())
# 214 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
list_lazy = ListLazyGenerator(m.named_parameters)
%timeit list(list_lazy)
# 3.04 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
tee_lazy = TeeLazyGenerator(m.named_parameters)
%timeit list(tee_lazy)
# 3.51 µs ± 87.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
tee_call_in_init = TeeCallInInitLazyGenerator(m.named_parameters)
%timeit list(tee_call_in_init)
# 3.29 µs ± 47 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Big Moduleimport torchvision
m = torchvision.models.resnet152()
# No generator
%timeit list(m.named_parameters())
# 1.67 ms ± 75.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
list_lazy = ListLazyGenerator(m.named_parameters)
%timeit list(list_lazy)
# 22.5 µs ± 405 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
tee_lazy = TeeLazyGenerator(m.named_parameters)
%timeit list(tee_lazy)
# 24.8 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
tee_call_in_init = TeeCallInInitLazyGenerator(m.named_parameters)
%timeit list(tee_call_in_init)
# 23.8 µs ± 253 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) I think |
Thanks for the measurements. I believe the time differences between the 3 implementations is negligible. The only downside of
This overhead is relevant when |
This PR was updated to use the tee version of the generator, and renames the class to |
|
||
assert mock.call_count == 0 | ||
|
||
first_return = list(lazy_gen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test would be a little clearer if the call count would also be tested immediately after the first call (i.e. here).
skorch/utils.py
Outdated
|
||
|
||
class TeeGenerator: | ||
"""Stores a function that produces a generator and calls list on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is no longer up-to-date. Also, could you add 2 newlines after the first sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small comments.
CHANGES.md
Outdated
@@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
- [Basic usage notebook][1810251445] now runs on Google Colab | |||
- [Advanced usage notebook][1810261633] now runs on Google Colab | |||
- Better user-facing messages when module or optimizer are re-initialized | |||
- The `on_grad_computed` callback function will yield an iterable for | |||
`named_parameters` only when it is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`named_parameters` only when it is used. | |
`named_parameters` only when it is used to reduce the run-time overhead of the call (#379) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to see how you like the CHANGELOG to be worded. 😅
skorch/utils.py
Outdated
|
||
def __iter__(self): | ||
if self.gen_ is None: | ||
self.gen_ = self.gen() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the TeeGenerator should use a generator, not a callable. The overhead is negligible but it allows for easy passing of partially consumed generators (e.g. when using models with two parameter sets) and is more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
skorch/utils.py
Outdated
|
||
class TeeGenerator: | ||
"""Stores a generator and calls ``tee`` on it to create new generators | ||
when ``TeeGenerator`` is iterated over. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when ``TeeGenerator`` is iterated over. | |
when ``TeeGenerator`` is iterated over to let you iterate over the given generator more than once. |
@thomasjpfan Would you mind looking at the open comments? |
@thomasjpfan Some small TODOs left:
|
|
Ah okay, I was looking at the outdated state shown by github. With the given change in the test, does the function name still make sense, though? |
The test was renamed to |
Thanks. |
Fixes #378