-
Notifications
You must be signed in to change notification settings - Fork 398
Added Neptune logging #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jakubczakon Thanks for the addition, I will have a thorough look at it soon. What would be the easiest way for me to test this? Is the example in the docstring working or do I need to create a neptune account? |
That's a fair point @BenjaminBossan The example in the docstrings (link to experiment) works. I think a decent solution to that would be to use the public token (anonymous mode) not to force anyone to register. |
Yes, I think everything that reduces the friction to try it out would be helpful. |
Done, now anyone can run the example without registering and see their experiments in neptune. |
Nice, I will give this a spin soon and come back with detailed feedback. From a first glance, it looks very good though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent addition, thank you very much Jakub. Nice to see how little is actually needed to make the integration work.
There are a few minor comments, please have a look at them. I think the biggest question that came up for me was if we really need batch level logging. At least for the given example, it's very noisy, maybe epoch level is enough?
Also, please add a sentence to the CHANGES.md.
skorch/callbacks/logging.py
Outdated
your net's history to Neptune. | ||
|
||
The best way to log additional information is to log directly to the | ||
experiment object or subclass the `on_*`` methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
experiment object or subclass the `on_*`` methods. | |
experiment object or subclass the ``on_*`` methods. |
skorch/callbacks/logging.py
Outdated
keys_ignored : str or list of str (default=None) | ||
Key or list of keys that should not be logged to | ||
Neptune. Note that in addition to the keys provided by the | ||
user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user. | |
user, keys such as those starting with 'event_' or ending on | |
'_best' are ignored by default. |
skorch/callbacks/logging.py
Outdated
Note | ||
---- | ||
|
||
Install psutil to monitor resource consumption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda redundant with the comment on line 76.
>>> # Create a neptune experiment object | ||
... # We are using api token for an anonymous user. | ||
... # For your projects use the token associated with your neptune.ai account | ||
>>> neptune.init(api_token='eyJhcGlfYWRkcmVzcyI6Imh0dHBzOi8vdWkubmVwdHVuZS5tbCIsImFwaV9rZXkiOiJiNzA2YmM4Zi03NmY5LTRjMmUtOTM5ZC00YmEwMzZmOTMyZTQifQ==', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could add the install instruction of neptune, as well as import neptune
to the code example.
self.keys_ignored_.add('batches') | ||
return self | ||
|
||
def on_batch_end(self, net, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really need batch level logging. Maybe logging at epoch level is sufficient? At least, I think it would make sense to allow to turn off batch level logging through a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often find having batch-level logging valuable but I agree there should be an option to turn it off.
Added it.
def test_fit_with_dict_input( | ||
self, | ||
net_cls, | ||
classifier_module, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argument not used (probably copied from tensorboard test that also doesn't need it). Tbh, I think this whole test can be removed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped it
I see that I messed up the formatting in some places with my local settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the optional batch-level logging is a good compromise. Would you be so kind to add a test for that? At the moment, this functionality is uncovered. You could, e.g., pass a mock object as Experiment
and count how often it was called or what it was called with (maybe reduce the batch size to trigger multiple batch-level calls).
Regarding the formatting changes: Some of them are good, some unnecessary but not harmful. I don't care either way.
I've added this test @BenjaminBossan, |
@jakubczakon I tried to understand the numbers. For this, I turned off the internal validation, as it makes the whole calculation more difficult (by adding Now we have the following situation:
That is 110 in sum, which is exactly what I find (please add this explanation to the comment for future reference). I think it would also make sense to copy the same test, but this time with Why you get different results locally, I don't know (for me, it's the same as in the CI). My first guess would be some kind of version mismatch. |
…nts, added test for when batch log is off
Thank you @BenjaminBossan! I've added the suggestions and it fixed the issue. |
Thanks @jakubczakon, this looks very good to me now. I will wait a few days to see if @ottonemo or @thomasjpfan have any comments on this, otherwise I will merge it. |
Awesome, thanks! |
self.keys_ignored = keys_ignored | ||
|
||
def initialize(self): | ||
self.first_batch_ = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is first_batch_
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is for consistency with the TensorBoard
callback. It is convenient to have so that you can, e.g., log an image of the network graph exactly once. You may not be able to use on_train_begin
for this because that one gets the input X
, not the one that is returned by the data loader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I simply copied it from TensorBoard (to be honest I haven't thought about it much).
Also, if I were to use it properly I should have self.first_batch_ = False
on on_batch_end
which is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added self.first_batch_ = False
to on_batch_end
but I can easily drop it from both as they are not used (to my understanding)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I wanted to have it for TensorBoard was to be able to trace and add a graph of the network to TensorBoard. I think that option doesn't exist for neptune, does it? However, I think consistency is also nice, so I would leave it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document the attribute in the docstring and add a quick test for first_batch_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea.
classifier_module, | ||
callbacks=[npt], | ||
max_epochs=3, | ||
).fit(*data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert how many times log_metric
should be called in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure can do.
self.keys_ignored = keys_ignored | ||
|
||
def initialize(self): | ||
self.first_batch_ = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document the attribute in the docstring and add a quick test for first_batch_
?
Done @thomasjpfan |
Awesome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work and taking the time to address the comments.
Added Neptune logger that:
on_batch_end
on_epoch_end
neptune_logger.experiment.log_whatever_neptune_allows
if used withclose_after_train=False