Make `load_archive` operate on serialization directories. #2554

brendan-ai2 · 2019-03-01T02:35:31Z

Fixes Using a serialisation directory as a model archive doesn't work with files_to_archive.json #1052.
Files like ELMo weight files aren't added to the serialization dir, but are added to the archive with add_file_to_archive.
This results in misleading errors when using load_archive as it will attempt to load a serialization dir.
Solution: Retain original paths when we're loading from a directory.
Drive by fix for overrides not overriding.

joelgrus · 2019-03-01T02:43:59Z

I don't think this is the right solution, for a couple of reasons:

this problem applies to other commands too
we do want to support loading from a directory that really corresponds to the contents of an archive file; the problem here was that it didn't

for #1 that suggests that the right place for the fix is in load_archive
for #2 I think a better fix is to go where it's doing the fta replacement

https://github.com/allenai/allennlp/blob/master/allennlp/models/archival.py#L200

and actually checking if those files exist, and giving a very explicit warning if they don't

I would also write a test to verify this behavior

joelgrus · 2019-03-01T02:52:08Z

also, probably this belongs in a different PR, but it's not good that using overrides didn't work here

brendan-ai2 · 2019-03-01T03:54:10Z

Good points! Given those, it seemed about as easy to make load_archive work on directories, so I did that. Hopefully that wasn't misguided... Fixed the overrides bug too.

julianmichael · 2019-03-01T04:10:35Z

Files only appear at fta/{key} inside the archive—not in the serialization directory—right? Unless I'm mistaken or this has changed, that means this fix will fail when running from a normal serialization directory. I think it should fall back to original_filename (as an absolute path) when the file is not found to cover this case.

I had to work around the bug myself recently and this behavior was very convenient for when I had to fix up some config files (because it was over NFS and editing/expanding/compressing huge tarballs was very slow).

brendan-ai2 · 2019-03-01T05:08:03Z

@julianmichael, I think I already accounted for that with this check: https://github.com/allenai/allennlp/pull/2554/files#diff-9dff246a3a033f8a1e8a2c409ef17e27R195 See also the test https://github.com/allenai/allennlp/pull/2554/files#diff-3113ca8bfda80819613f55d7a7a51f7aR147.

Please let me know if I'm missing something.

julianmichael · 2019-03-01T21:25:25Z

Ah I see now. It also seems the trainer behavior has changed so it puts the best weights at weights.th instead of best.th. Now I can see that this would work. Sorry about that—thanks!

brendan-ai2 · 2019-03-01T21:37:06Z

No worries! Would have been an easy bug to write, so such feedback is appreciated. :)

joelgrus · 2019-03-01T21:58:20Z

allennlp/models/archival.py

@@ -175,8 +175,10 @@ def load_archive(archive_file: str,
        logger.info(f"loading archive file {archive_file} from cache at {resolved_archive_file}")

    if os.path.isdir(resolved_archive_file):
+        loading_dir = True


I would slightly prefer a more descriptive name like loading_from_directory or archive_is_directory or something like that. loading_dir sounds like it's the directory itself

Var removed.

joelgrus · 2019-03-01T21:59:32Z

allennlp/models/archival.py

@@ -190,18 +192,22 @@ def load_archive(archive_file: str,

    # Check for supplemental files in archive
    fta_filename = os.path.join(serialization_dir, _FTA_NAME)
-    if os.path.exists(fta_filename):
+    if not loading_dir and os.path.exists(fta_filename):


is this the right logic? imagine that I have a model.tar.gz, and then I untar it somewhere and then point to that location. in that case we'd have loading_dir == True but we'd still want to grab the files from the "archive" (which is a directory)

Good point. It does seem possible that some users are relying on this. I've changed the behavior again to act as it did before if the files are present, but to load their serialization directory equivalents otherwise.

(I also considered touching an ARCHIVE_MARKER or SERIALIZATION_DIR_MARKER that we could check explicitly, but it seem desirable to have this work for existing dirs and archives.)

DeNeutoy

FYI this issue describes the problem in more depth. I haven't reviewed the PR so it might fix it, but I hadn't seen the issue crop up in the discussions so maybe there's something there:

#1052

joelgrus

this looks good, thanks

brendan-ai2 · 2019-03-04T20:08:27Z

Thanks @DeNeutoy for the link and @joelgrus for the review!

DeNeutoy · 2019-03-04T20:37:28Z

Nice one, this has been a long running annoying issue, good to clear it out!

Make `load_archive` operate on serialization directories. (allenai#2554)

- Fixes allenai#1052. - Files like ELMo weight files aren't added to the serialization dir, but are added to the archive with `add_file_to_archive`. - This results in misleading errors when using `load_archive` as it will attempt to load a serialization dir. - Solution: Retain original paths when we're loading from a directory. - Drive by fix for overrides not overriding.

Add warning on loading a directory with evaluate.

df8368a

brendan-ai2 requested a review from joelgrus March 1, 2019 02:35

brendan-ai2 assigned joelgrus Mar 1, 2019

fixes

a485e09

brendan-ai2 changed the title ~~Add warning on loading a directory with evaluate.~~ Make load_archive operate on serialization directories. Mar 1, 2019

drop import

d563971

joelgrus reviewed Mar 1, 2019

View reviewed changes

brendan-ai2 added 2 commits March 1, 2019 18:28

name change

5b29e23

Fixes

58887e6

DeNeutoy reviewed Mar 4, 2019

View reviewed changes

joelgrus approved these changes Mar 4, 2019

View reviewed changes

brendan-ai2 merged commit d0f7170 into allenai:master Mar 4, 2019

Whu-wxy added a commit to Whu-wxy/allennlp that referenced this pull request Mar 6, 2019

Merge pull request #11 from allenai/master

b77b072

Make `load_archive` operate on serialization directories. (allenai#2554)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `load_archive` operate on serialization directories. #2554

Make `load_archive` operate on serialization directories. #2554

Uh oh!

brendan-ai2 commented Mar 1, 2019 •

edited

Loading

Uh oh!

joelgrus commented Mar 1, 2019

Uh oh!

joelgrus commented Mar 1, 2019

Uh oh!

brendan-ai2 commented Mar 1, 2019

Uh oh!

julianmichael commented Mar 1, 2019

Uh oh!

brendan-ai2 commented Mar 1, 2019

Uh oh!

julianmichael commented Mar 1, 2019

Uh oh!

brendan-ai2 commented Mar 1, 2019

Uh oh!

joelgrus Mar 1, 2019

Uh oh!

brendan-ai2 Mar 2, 2019 •

edited

Loading

Uh oh!

joelgrus Mar 1, 2019

Uh oh!

brendan-ai2 Mar 2, 2019

Uh oh!

DeNeutoy left a comment

Uh oh!

joelgrus left a comment

Uh oh!

brendan-ai2 commented Mar 4, 2019

Uh oh!

DeNeutoy commented Mar 4, 2019

Uh oh!

Uh oh!

Make load_archive operate on serialization directories. #2554

Make load_archive operate on serialization directories. #2554

Uh oh!

Conversation

brendan-ai2 commented Mar 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joelgrus commented Mar 1, 2019

Uh oh!

joelgrus commented Mar 1, 2019

Uh oh!

brendan-ai2 commented Mar 1, 2019

Uh oh!

julianmichael commented Mar 1, 2019

Uh oh!

brendan-ai2 commented Mar 1, 2019

Uh oh!

julianmichael commented Mar 1, 2019

Uh oh!

brendan-ai2 commented Mar 1, 2019

Uh oh!

joelgrus Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

brendan-ai2 Mar 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joelgrus Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

brendan-ai2 Mar 2, 2019

Choose a reason for hiding this comment

Uh oh!

DeNeutoy left a comment

Choose a reason for hiding this comment

Uh oh!

joelgrus left a comment

Choose a reason for hiding this comment

Uh oh!

brendan-ai2 commented Mar 4, 2019

Uh oh!

DeNeutoy commented Mar 4, 2019

Uh oh!

Uh oh!

Make `load_archive` operate on serialization directories. #2554

Make `load_archive` operate on serialization directories. #2554

brendan-ai2 commented Mar 1, 2019 •

edited

Loading

brendan-ai2 Mar 2, 2019 •

edited

Loading