Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Proposal of utility function + command to push models to HF Hub #5370

Merged
merged 22 commits into from
Sep 30, 2021

Conversation

osanseviero
Copy link
Contributor

This PR allows users to push their repos to the Hugging Face Hub as a continuation of #5052. This is done with a push_to_hf utility function which is also added as a command.

Use case 1: Pushing from a serializable-directory to the Hub

allennlp push_to_hf --archive_path model -n test_allennlp

Example output: https://huggingface.co/osanseviero/test_allennlp

Use case 2: Pushing from an archive (.tar.gz) file to the Hub

This is specially useful for the AI2 team to upload their existing models

allennlp push_to_hf --archive_path bidaf-model-2020.03.19.tar.gz -n bidaf-model-2020.03.19

Example output: https://huggingface.co/osanseviero/bidaf-model-2020.03.19

Few things to notice:

  • The repo launches a TensorBoard instance if traces are available

Screen Shot 2021-08-19 at 11 35 37 PM

  • Users get a "Use in AllenNLP" snippet that shows how to use the model

Screen Shot 2021-08-19 at 11 36 06 PM

Screen Shot 2021-08-19 at 11 36 45 PM

If you are satisfied with the high-level approach + code location, I'll proceed to implement the test suite for this.

I wanted to keep this PR relatively small, but as a follow-up, we can do improvements so the automated model card contains useful information. Additionally, we can add the metrics from metrics.json.

@epwalsh epwalsh requested a review from dirkgr August 20, 2021 00:55
@epwalsh
Copy link
Member

epwalsh commented Aug 20, 2021

The test failure in "GPU Tests" is due to OOM, unrelated to your changes. #5371 is a temporary fix.

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will do the right thing both when the repo already exists, and when it's a new repo, right?

@dirkgr dirkgr self-assigned this Aug 23, 2021
@osanseviero
Copy link
Contributor Author

This will do the right thing both when the repo already exists, and when it's a new repo, right?

Sorry, forgot to answer this.

Yes, it will create a new repo if it does not exist. If it does exist, one bad thing of running the script multiple times is that we don't override the model card to prevent deleting work. We have an issue that should make this easier to do in a future iteration.

@osanseviero
Copy link
Contributor Author

Hi @dirkgr, do you have any suggestions on how to do the testing of pushing to the Hub? I have few alternatives:

1. We have a staging backend and we could provide an account in which the test can push the models (example)

Pros:

  • Tests end-to-end integration
  • More robust

Cons

  • Introduces dependance upon external server, which could introduce flakiness
  • Relying on external services is not a great practice imo

2. Mock the responses from the server (no-network test)

Pros:

  • Does not have server dependencies
  • We can still test the local directory commits

Cons:

  • Test resembles less what push_to_hub really does
  • More work to set up

3. Any other ideas that you might have :D

@dirkgr
Copy link
Member

dirkgr commented Aug 25, 2021

We already rely on HF servers (and others) in the tests. I don't know if this is an exhaustive list, but I'm pretty sure we need at least S3, GCS, Huggingface, pypi, and GitHub to be up and running. There maybe be others. Wandb maybe? I think testing against a HF staging server is acceptable, unless we find that the staging server is down all the time when you're testing other things on it.

@osanseviero
Copy link
Contributor Author

Thanks a lot for your input! I've added a test suite for the push_to_hf functionality.

@dirkgr
Copy link
Member

dirkgr commented Sep 8, 2021

Since I am on 👶🏻 leave, I am leaving this in @epwalsh's capable hands.

@julien-c
Copy link
Contributor

julien-c commented Sep 8, 2021

Since I am on 👶🏻 leave, I am leaving this in @epwalsh's capable hands.

This is awesome news, enjoy the 👶🏻 leave @dirkgr!! Congrats 😊

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Thanks for you work @osanseviero!

I noticed the new tests are failing due to an authentication error. I'm guessing we need to add a token as a secret to GitHub Actions?

@osanseviero
Copy link
Contributor Author

Congrats @dirkgr!!! Those are exciting news!

@epwalsh, when the environment variable HUGGINGFACE_CO_STAGING is set, the test runs in a staging environment in which the authorization error should not happen. From the error logs, it seems like the test is not using staging. I likely did not set the environment variable in the right place in the GA workflow.

Given that I cannot run the workflow manually, I would appreciate if you could give a hand on checking where is the right place for setting the environment variable for the test.

@osanseviero osanseviero requested a review from epwalsh September 8, 2021 19:49
@epwalsh
Copy link
Member

epwalsh commented Sep 10, 2021

@osanseviero ah I see. It does look like it was put in the wrong place. If you put that environment variable up here it should work.

@osanseviero
Copy link
Contributor Author

Thanks for the pointer!

@osanseviero
Copy link
Contributor Author

I missed there were other tests using HF models in file_utils.py which now fail due to the environment variable.

@LysandreJik do you think we could create a similar repo in staging so we can entirely rely on that environment?
https://github.com/allenai/allennlp/blob/main/tests/common/file_utils_test.py#L603

@epwalsh
Copy link
Member

epwalsh commented Sep 13, 2021

@osanseviero, what if we just set that environment variable from within the Python tests? That way we only have to set it for tests that require it.

@osanseviero
Copy link
Contributor Author

@LysandreJik please correct me if I'm wrong, but the reason behind using an environment variable vs setting the variable in the Python test is because this happens at import time.

I think one solution would be to move from huggingface_hub import HfApi, HfFolder, Repository to within the push_to_hf() function. Then we could set up the environment variable in the test.

@LysandreJik
Copy link
Contributor

LysandreJik commented Sep 13, 2021

In the huggingface_hub tests we also have a with_production_testing context manager which is made exactly for that purpose!

@osanseviero we can definitely create a repo on the staging which can be used instead. The staging isn't super stable so it is bound to change - so either we can upload files before running the test suite, or we can use a similar method to the with_production_testing method used in hfh (we could also have it be importable here!)

Indeed, we define the URL to the endpoint as a global variable; with an environment variable we can control that variable without any issues.

@osanseviero
Copy link
Contributor Author

Sorry for the delay. I think tests should pass now in GA. Since only the push_to_hf tests require staging, I just kept the decorator there.

Let me know if other changes are required

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @osanseviero! Looks great!

@epwalsh epwalsh merged commit 603552f into allenai:main Sep 30, 2021
@osanseviero osanseviero deleted the push_to_hf branch September 30, 2021 07:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants