-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Add synchronous parameter to MLflowLogger #19639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add synchronous parameter to MLflowLogger #19639
Conversation
@clumsy Thanks for the PR. I'm not sure we should do this. I couldn't believe that MLFlowLogger is blocking, so I did a quick search and found this: mlflow/mlflow#1550 (comment) |
@awaelchli The issue you linked suggest using the workaround provided in this PR - i.e. don't make asynchronous calls using fluent API like The call to Bumping the bounds of
The difference this tweak brings is very significant in our experiments, basically making the calls to |
@clumsy I just vote against exposing a feature flag as an entire subclass of the that logger. That's just not the Lightning way. Let's investigate adding a flag to the existing logger and passing down the synchronous flag to the |
A small correction, @awaelchli - this PR is not using the experimental feature flag from the recent MLflow versions. |
Sure, I'm fine with this being an implementation detail. I still don't see enough justification to make this an entirely new class. I stay with my recommendation: |
How about this then, @awaelchli:
|
Personally I think that adds a lot of code complexity to do in two different ways. I'd vote for expose a flag and then just log a warning in the case that the mlflow version is below 2.8.0. I'd also vote for calling the argument |
As for code complexity - isn't it an implementation detail? The user never sees this. Logging, or silently continuing with the synchronous workflow when indicated otherwise by the user violates the principle of least astonishment I'm afraid. I'd rather we assert the flag (or its value I like the naming suggestion to keep it consistent with mlflow. |
I'm fine with either approach. I would only add the manual implementation if upgrading mlflow adds barriers for users. But no strong opinion on this. |
It is implementation detail but still adds more code to the project to maintain for relatively little benefit IMO (supporting async in old mlflow versions) . The addition of the I think raising an exception in that case rather than logging a warning makes sense. |
a9fea7e
to
4232c83
Compare
Please check the new version of this PR, @dannyfriar, @awaelchli |
Nice - that LGTM! |
f57862f
to
5e5ccd1
Compare
5e5ccd1
to
17a1f7e
Compare
Thank you for the feature @clumsy and @dannyfriar for discussion! |
What does this PR do?
Adding an asynchronous version of
MLflowLogger
. It will do the I/O in a separate thread, thus not blocking the caller (e.g. main training thread).Note that the values and timestamps of the metrics, params and tags are created synchronously prior the asynchronous I/O thus preserving the original content.
Fixes #19710
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--19639.org.readthedocs.build/en/19639/