Skip to content

Pickable storage driver clients #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

antoinejeannot
Copy link
Contributor

Hi !

This PR aims at adding a way to pickle the ModelLibrary which, for the moment, is impossible due to storage drivers (especially boto3 and gcs).

It introduces a MODELKIT_LAZY_DRIVER environment variable which, if set, will prevent the Storage Providers from storing the drivers (boto3, gcs, azure), instead, the configuration settings will be stored allowing the corresponding Storage Provider to build it on the fly.

This will make way easier the use of python libraries which use pickle such as Apache Spark and multi-processing, leveraging modelkit.

Thanks for reviewing, as always!

ldeflandre
ldeflandre previously approved these changes Dec 19, 2022
Copy link
Contributor

@ldeflandre ldeflandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done.

It would be nice to document the new usage it enables !

@antoinejeannot
Copy link
Contributor Author

@ldeflandre sorry for the missing documentation 🤦
Please take a look at the latest commit!

@llin17
Copy link
Contributor

llin17 commented Dec 19, 2022

Good job ! I'm wondering the use-case differences between MODELKIT_LAZT_LOADING and MODELKIT_LAZY_DRIVER. Dose setting MODELKIT_LAZY_DRIVER=True cover all the use-case of MODELKIT_LAZT_LOADING=True ? If it's the case, should we remove MODELKIT_LAZT_LOADING to avoid confusion ?

@antoinejeannot
Copy link
Contributor Author

I would like to wait for @tgenin's approval before merging

@antoinejeannot
Copy link
Contributor Author

antoinejeannot commented Dec 19, 2022

@llin17 that's a good remark thanks! Actually the two mechanisms have different goals, and you might want to allow the MODELKIT_LAZY_DRIVER to access your cloud provider at any time and disable MODELKIT_LAZY_LOADING to build all your models before forking / sending models to workers etc. (with the assumption that your models are pickable of course :))

@llin17
Copy link
Contributor

llin17 commented Dec 19, 2022

Thanks ! What do you think if we set the default value of MODELKIT_LAZY_DRIVER to True ?

ldeflandre
ldeflandre previously approved these changes Dec 19, 2022
tgenin
tgenin previously approved these changes Dec 20, 2022
@tgenin
Copy link
Contributor

tgenin commented Dec 20, 2022

Very nice feature, thank you @antoinejeannot

@antoinejeannot antoinejeannot dismissed stale reviews from tgenin and ldeflandre via b67c989 December 20, 2022 09:27
ldeflandre
ldeflandre previously approved these changes Dec 20, 2022
@antoinejeannot
Copy link
Contributor Author

Thanks for your pretty relevant remarks as always @tgenin, please take a look at the latest commit 🙏

tgenin
tgenin previously approved these changes Dec 20, 2022
@antoinejeannot antoinejeannot dismissed stale reviews from tgenin and ldeflandre via b65fc4b December 20, 2022 15:26
@antoinejeannot antoinejeannot force-pushed the pickable_storage_driver_clients branch from b65fc4b to 1b8e5a4 Compare December 20, 2022 15:30
@antoinejeannot antoinejeannot merged commit 533f1a3 into Cornerstone-OnDemand:main Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants