Proposal of utility function + command to push models to HF Hub #5370

osanseviero · 2021-08-19T21:47:57Z

This PR allows users to push their repos to the Hugging Face Hub as a continuation of #5052. This is done with a push_to_hf utility function which is also added as a command.

Use case 1: Pushing from a serializable-directory to the Hub

allennlp push_to_hf --archive_path model -n test_allennlp

Example output: https://huggingface.co/osanseviero/test_allennlp

Use case 2: Pushing from an archive (.tar.gz) file to the Hub

This is specially useful for the AI2 team to upload their existing models

allennlp push_to_hf --archive_path bidaf-model-2020.03.19.tar.gz -n bidaf-model-2020.03.19

Example output: https://huggingface.co/osanseviero/bidaf-model-2020.03.19

Few things to notice:

The repo launches a TensorBoard instance if traces are available

Users get a "Use in AllenNLP" snippet that shows how to use the model

Users can search for all AllenNLP models in https://huggingface.co/models?filter=allennlp
Users get working widget to try out the model in the browser (currently some changes ongoing in Upgrade AllenNLP + hotfix huggingface/huggingface_hub#287)

If you are satisfied with the high-level approach + code location, I'll proceed to implement the test suite for this.

I wanted to keep this PR relatively small, but as a follow-up, we can do improvements so the automated model card contains useful information. Additionally, we can add the metrics from metrics.json.

epwalsh · 2021-08-20T00:56:11Z

The test failure in "GPU Tests" is due to OOM, unrelated to your changes. #5371 is a temporary fix.

dirkgr

This will do the right thing both when the repo already exists, and when it's a new repo, right?

allennlp/commands/push_to_hf.py

allennlp/common/push_to_hf.py

osanseviero · 2021-08-25T09:48:16Z

This will do the right thing both when the repo already exists, and when it's a new repo, right?

Sorry, forgot to answer this.

Yes, it will create a new repo if it does not exist. If it does exist, one bad thing of running the script multiple times is that we don't override the model card to prevent deleting work. We have an issue that should make this easier to do in a future iteration.

osanseviero · 2021-08-25T20:07:00Z

Hi @dirkgr, do you have any suggestions on how to do the testing of pushing to the Hub? I have few alternatives:

1. We have a staging backend and we could provide an account in which the test can push the models (example)

Pros:

Tests end-to-end integration
More robust

Cons

Introduces dependance upon external server, which could introduce flakiness
Relying on external services is not a great practice imo

2. Mock the responses from the server (no-network test)

Pros:

Does not have server dependencies
We can still test the local directory commits

Cons:

Test resembles less what push_to_hub really does
More work to set up

3. Any other ideas that you might have :D

dirkgr · 2021-08-25T20:52:57Z

We already rely on HF servers (and others) in the tests. I don't know if this is an exhaustive list, but I'm pretty sure we need at least S3, GCS, Huggingface, pypi, and GitHub to be up and running. There maybe be others. Wandb maybe? I think testing against a HF staging server is acceptable, unless we find that the staging server is down all the time when you're testing other things on it.

osanseviero · 2021-09-02T13:18:28Z

Thanks a lot for your input! I've added a test suite for the push_to_hf functionality.

dirkgr · 2021-09-08T06:58:26Z

Since I am on 👶🏻 leave, I am leaving this in @epwalsh's capable hands.

julien-c · 2021-09-08T08:38:57Z

Since I am on 👶🏻 leave, I am leaving this in @epwalsh's capable hands.

This is awesome news, enjoy the 👶🏻 leave @dirkgr!! Congrats 😊

epwalsh

This looks good! Thanks for you work @osanseviero!

I noticed the new tests are failing due to an authentication error. I'm guessing we need to add a token as a secret to GitHub Actions?

allennlp/commands/push_to_hf.py

Co-authored-by: Pete <[email protected]>

osanseviero · 2021-09-08T19:47:41Z

Congrats @dirkgr!!! Those are exciting news!

@epwalsh, when the environment variable HUGGINGFACE_CO_STAGING is set, the test runs in a staging environment in which the authorization error should not happen. From the error logs, it seems like the test is not using staging. I likely did not set the environment variable in the right place in the GA workflow.

Given that I cannot run the workflow manually, I would appreciate if you could give a hand on checking where is the right place for setting the environment variable for the test.

epwalsh · 2021-09-10T16:02:55Z

@osanseviero ah I see. It does look like it was put in the wrong place. If you put that environment variable up here it should work.

osanseviero · 2021-09-10T19:47:26Z

Thanks for the pointer!

osanseviero · 2021-09-13T16:53:34Z

I missed there were other tests using HF models in file_utils.py which now fail due to the environment variable.

@LysandreJik do you think we could create a similar repo in staging so we can entirely rely on that environment?
https://github.com/allenai/allennlp/blob/main/tests/common/file_utils_test.py#L603

epwalsh · 2021-09-13T16:57:40Z

@osanseviero, what if we just set that environment variable from within the Python tests? That way we only have to set it for tests that require it.

osanseviero · 2021-09-13T17:03:11Z

@LysandreJik please correct me if I'm wrong, but the reason behind using an environment variable vs setting the variable in the Python test is because this happens at import time.

I think one solution would be to move from huggingface_hub import HfApi, HfFolder, Repository to within the push_to_hf() function. Then we could set up the environment variable in the test.

LysandreJik · 2021-09-13T17:46:48Z

In the huggingface_hub tests we also have a with_production_testing context manager which is made exactly for that purpose!

~~@osanseviero we can definitely create a repo on the staging which can be used instead.~~ The staging isn't super stable so it is bound to change - so either we can upload files before running the test suite, or we can use a similar method to the with_production_testing method used in hfh (we could also have it be importable here!)

Indeed, we define the URL to the endpoint as a global variable; with an environment variable we can control that variable without any issues.

…into push_to_hf

osanseviero · 2021-09-28T10:22:26Z

Sorry for the delay. I think tests should pass now in GA. Since only the push_to_hf tests require staging, I just kept the decorator there.

Let me know if other changes are required

epwalsh

Thank you so much @osanseviero! Looks great!

osanseviero and others added 3 commits August 19, 2021 23:32

Implement initial push_to_hf utility + command

0d7ea93

Improve command description

907c423

Merge branch 'main' into push_to_hf

31d8238

epwalsh requested a review from dirkgr August 20, 2021 00:55

Merge branch 'main' into push_to_hf

9f35a05

dirkgr reviewed Aug 20, 2021

View reviewed changes

allennlp/commands/push_to_hf.py Outdated Show resolved Hide resolved

allennlp/common/push_to_hf.py Outdated Show resolved Hide resolved

allennlp/common/push_to_hf.py Outdated Show resolved Hide resolved

allennlp/common/push_to_hf.py Outdated Show resolved Hide resolved

dirkgr self-assigned this Aug 23, 2021

osanseviero added 3 commits August 25, 2021 11:58

Split serialization_dir and archive_path use cases

a899c81

Change to PathLike

83d322d

Fix

c503ffd

dirkgr and others added 2 commits August 25, 2021 13:48

Merge branch 'main' into push_to_hf

7cec4a4

dummy fix

ea0734e

Add tests for pushing to Hub

5b9e839

osanseviero requested a review from dirkgr September 2, 2021 13:18

osanseviero mentioned this pull request Sep 6, 2021

Generic push_to_hub by adding optional save_fn huggingface/huggingface_hub#310

Open

dirkgr requested a review from epwalsh September 8, 2021 06:57

dirkgr assigned epwalsh and unassigned dirkgr Sep 8, 2021

Merge branch 'main' into push_to_hf

0131481

epwalsh reviewed Sep 8, 2021

View reviewed changes

allennlp/commands/push_to_hf.py Outdated Show resolved Hide resolved

Update allennlp/commands/push_to_hf.py

88842e2

Co-authored-by: Pete <[email protected]>

Changelog

eb945a0

osanseviero requested a review from epwalsh September 8, 2021 19:49

epwalsh and others added 2 commits September 10, 2021 09:03

Merge branch 'main' into push_to_hf

ad07622

Fix GA workflow

b5a0be0

invalidate cache

842aaf4

osanseviero added 3 commits September 28, 2021 12:11

Patch staging in tests

50fa431

Merge branch 'push_to_hf' of https://github.com/osanseviero/allennlp …

4c303ed

…into push_to_hf

Style and remove diffs

559a22d

epwalsh and others added 3 commits September 28, 2021 09:34

Merge branch 'main' into push_to_hf

84273ca

fix changelog

cc36028

replace '_' with '-' for consistency

0d4e1f1

epwalsh approved these changes Sep 30, 2021

View reviewed changes

epwalsh merged commit 603552f into allenai:main Sep 30, 2021

osanseviero deleted the push_to_hf branch September 30, 2021 07:17

Proposal of utility function + command to push models to HF Hub #5370

Proposal of utility function + command to push models to HF Hub #5370

Uh oh!

Conversation

osanseviero commented Aug 19, 2021

Uh oh!

epwalsh commented Aug 20, 2021

Uh oh!

dirkgr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

osanseviero commented Aug 25, 2021

Uh oh!

osanseviero commented Aug 25, 2021

Uh oh!

dirkgr commented Aug 25, 2021

Uh oh!

osanseviero commented Sep 2, 2021

Uh oh!

dirkgr commented Sep 8, 2021

Uh oh!

julien-c commented Sep 8, 2021

Uh oh!

epwalsh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

osanseviero commented Sep 8, 2021

Uh oh!

epwalsh commented Sep 10, 2021

Uh oh!

osanseviero commented Sep 10, 2021

Uh oh!

osanseviero commented Sep 13, 2021

Uh oh!

epwalsh commented Sep 13, 2021

Uh oh!

osanseviero commented Sep 13, 2021

Uh oh!

LysandreJik commented Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osanseviero commented Sep 28, 2021

Uh oh!

epwalsh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LysandreJik commented Sep 13, 2021 •

edited

Loading