-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Make load_archive
operate on serialization directories.
#2554
Conversation
I don't think this is the right solution, for a couple of reasons:
for #1 that suggests that the right place for the fix is in https://github.com/allenai/allennlp/blob/master/allennlp/models/archival.py#L200 and actually checking if those files exist, and giving a very explicit warning if they don't I would also write a test to verify this behavior |
also, probably this belongs in a different PR, but it's not good that using |
load_archive
operate on serialization directories.
Good points! Given those, it seemed about as easy to make |
Files only appear at I had to work around the bug myself recently and this behavior was very convenient for when I had to fix up some config files (because it was over NFS and editing/expanding/compressing huge tarballs was very slow). |
@julianmichael, I think I already accounted for that with this check: https://github.com/allenai/allennlp/pull/2554/files#diff-9dff246a3a033f8a1e8a2c409ef17e27R195 See also the test https://github.com/allenai/allennlp/pull/2554/files#diff-3113ca8bfda80819613f55d7a7a51f7aR147. Please let me know if I'm missing something. |
Ah I see now. It also seems the trainer behavior has changed so it puts the best weights at |
No worries! Would have been an easy bug to write, so such feedback is appreciated. :) |
allennlp/models/archival.py
Outdated
@@ -175,8 +175,10 @@ def load_archive(archive_file: str, | |||
logger.info(f"loading archive file {archive_file} from cache at {resolved_archive_file}") | |||
|
|||
if os.path.isdir(resolved_archive_file): | |||
loading_dir = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would slightly prefer a more descriptive name like loading_from_directory
or archive_is_directory
or something like that. loading_dir
sounds like it's the directory itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Var removed.
allennlp/models/archival.py
Outdated
@@ -190,18 +192,22 @@ def load_archive(archive_file: str, | |||
|
|||
# Check for supplemental files in archive | |||
fta_filename = os.path.join(serialization_dir, _FTA_NAME) | |||
if os.path.exists(fta_filename): | |||
if not loading_dir and os.path.exists(fta_filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the right logic? imagine that I have a model.tar.gz, and then I untar it somewhere and then point to that location. in that case we'd have loading_dir == True
but we'd still want to grab the files from the "archive" (which is a directory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It does seem possible that some users are relying on this. I've changed the behavior again to act as it did before if the files are present, but to load their serialization directory equivalents otherwise.
(I also considered touch
ing an ARCHIVE_MARKER
or SERIALIZATION_DIR_MARKER
that we could check explicitly, but it seem desirable to have this work for existing dirs and archives.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this issue describes the problem in more depth. I haven't reviewed the PR so it might fix it, but I hadn't seen the issue crop up in the discussions so maybe there's something there:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good, thanks
Nice one, this has been a long running annoying issue, good to clear it out! |
Make `load_archive` operate on serialization directories. (allenai#2554)
- Fixes allenai#1052. - Files like ELMo weight files aren't added to the serialization dir, but are added to the archive with `add_file_to_archive`. - This results in misleading errors when using `load_archive` as it will attempt to load a serialization dir. - Solution: Retain original paths when we're loading from a directory. - Drive by fix for overrides not overriding.
add_file_to_archive
.load_archive
as it will attempt to load a serialization dir.