Skip to content

Added Neptune logging #586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 16, 2020
Merged

Added Neptune logging #586

merged 15 commits into from
Feb 16, 2020

Conversation

jakubczakon
Copy link
Contributor

Added Neptune logger that:

  • logs metrics on_batch_end
  • logs metrics on_epoch_end
  • logs any additional information directly to neptune_logger.experiment.log_whatever_neptune_allows if used with close_after_train=False

@jakubczakon jakubczakon reopened this Feb 3, 2020
@BenjaminBossan
Copy link
Collaborator

@jakubczakon Thanks for the addition, I will have a thorough look at it soon.

What would be the easiest way for me to test this? Is the example in the docstring working or do I need to create a neptune account?

@jakubczakon
Copy link
Contributor Author

That's a fair point @BenjaminBossan

The example in the docstrings (link to experiment) works.
You can copy and run the code but you would need to create a (free) account.

I think a decent solution to that would be to use the public token (anonymous mode) not to force anyone to register.
What do you think about that?

@BenjaminBossan
Copy link
Collaborator

I think a decent solution to that would be to use the public token (anonymous mode) not to force anyone to register.
What do you think about that?

Yes, I think everything that reduces the friction to try it out would be helpful.

@jakubczakon
Copy link
Contributor Author

Done, now anyone can run the example without registering and see their experiments in neptune.

@BenjaminBossan
Copy link
Collaborator

Nice, I will give this a spin soon and come back with detailed feedback. From a first glance, it looks very good though.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent addition, thank you very much Jakub. Nice to see how little is actually needed to make the integration work.

There are a few minor comments, please have a look at them. I think the biggest question that came up for me was if we really need batch level logging. At least for the given example, it's very noisy, maybe epoch level is enough?

Also, please add a sentence to the CHANGES.md.

your net's history to Neptune.

The best way to log additional information is to log directly to the
experiment object or subclass the `on_*`` methods.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
experiment object or subclass the `on_*`` methods.
experiment object or subclass the ``on_*`` methods.

keys_ignored : str or list of str (default=None)
Key or list of keys that should not be logged to
Neptune. Note that in addition to the keys provided by the
user.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
user.
user, keys such as those starting with 'event_' or ending on
'_best' are ignored by default.

Note
----

Install psutil to monitor resource consumption
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda redundant with the comment on line 76.

>>> # Create a neptune experiment object
... # We are using api token for an anonymous user.
... # For your projects use the token associated with your neptune.ai account
>>> neptune.init(api_token='eyJhcGlfYWRkcmVzcyI6Imh0dHBzOi8vdWkubmVwdHVuZS5tbCIsImFwaV9rZXkiOiJiNzA2YmM4Zi03NmY5LTRjMmUtOTM5ZC00YmEwMzZmOTMyZTQifQ==',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add the install instruction of neptune, as well as import neptune to the code example.

self.keys_ignored_.add('batches')
return self

def on_batch_end(self, net, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need batch level logging. Maybe logging at epoch level is sufficient? At least, I think it would make sense to allow to turn off batch level logging through a parameter.

Copy link
Contributor Author

@jakubczakon jakubczakon Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often find having batch-level logging valuable but I agree there should be an option to turn it off.
Added it.

def test_fit_with_dict_input(
self,
net_cls,
classifier_module,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argument not used (probably copied from tensorboard test that also doesn't need it). Tbh, I think this whole test can be removed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped it

@jakubczakon
Copy link
Contributor Author

I see that I messed up the formatting in some places with my local settings.
That sais, it passes pylint (with warnings)
Should I fix those, or they are not important to you?

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the optional batch-level logging is a good compromise. Would you be so kind to add a test for that? At the moment, this functionality is uncovered. You could, e.g., pass a mock object as Experiment and count how often it was called or what it was called with (maybe reduce the batch size to trigger multiple batch-level calls).

Regarding the formatting changes: Some of them are good, some unnecessary but not harmful. I don't care either way.

@jakubczakon
Copy link
Contributor Author

I've added this test @BenjaminBossan,
Interestingly this test passes locally, calling my `.log_metric 130 times, yet on github it calls it only 120 times.
I will investigate but seems unexpected.

@BenjaminBossan
Copy link
Collaborator

@jakubczakon I tried to understand the numbers. For this, I turned off the internal validation, as it makes the whole calculation more difficult (by adding train_split=False as argument to the net).

Now we have the following situation:

  • on batch level, we have 2 keys x (40 / 4) batches x 5 epochs = 100 calls
  • on epoch level, we have 2 keys x 5 epochs = 10 calls

That is 110 in sum, which is exactly what I find (please add this explanation to the comment for future reference).

I think it would also make sense to copy the same test, but this time with log_on_batch_end=False. Then we should expect 10 calls.

Why you get different results locally, I don't know (for me, it's the same as in the CI). My first guess would be some kind of version mismatch.

@jakubczakon
Copy link
Contributor Author

Thank you @BenjaminBossan!

I've added the suggestions and it fixed the issue.

@BenjaminBossan
Copy link
Collaborator

Thanks @jakubczakon, this looks very good to me now.

I will wait a few days to see if @ottonemo or @thomasjpfan have any comments on this, otherwise I will merge it.

@jakubczakon
Copy link
Contributor Author

Awesome, thanks!

self.keys_ignored = keys_ignored

def initialize(self):
self.first_batch_ = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is first_batch_ used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is for consistency with the TensorBoard callback. It is convenient to have so that you can, e.g., log an image of the network graph exactly once. You may not be able to use on_train_begin for this because that one gets the input X, not the one that is returned by the data loader.

Copy link
Contributor Author

@jakubczakon jakubczakon Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I simply copied it from TensorBoard (to be honest I haven't thought about it much).

Also, if I were to use it properly I should have self.first_batch_ = False on on_batch_end which is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added self.first_batch_ = False to on_batch_end but I can easily drop it from both as they are not used (to my understanding)

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I wanted to have it for TensorBoard was to be able to trace and add a graph of the network to TensorBoard. I think that option doesn't exist for neptune, does it? However, I think consistency is also nice, so I would leave it there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document the attribute in the docstring and add a quick test for first_batch_?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea.

classifier_module,
callbacks=[npt],
max_epochs=3,
).fit(*data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert how many times log_metric should be called in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can do.

self.keys_ignored = keys_ignored

def initialize(self):
self.first_batch_ = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document the attribute in the docstring and add a quick test for first_batch_?

@jakubczakon
Copy link
Contributor Author

Done @thomasjpfan

@jakubczakon
Copy link
Contributor Author

Awesome!
Thank you @BenjaminBossan @thomasjpfan !

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work and taking the time to address the comments.

@BenjaminBossan BenjaminBossan merged commit dc70fb4 into skorch-dev:master Feb 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants