This repository was archived by the owner on Dec 16, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Create Vocabulary from both pretrained transformers and instances #5368
Merged
dirkgr
merged 7 commits into
allenai:main
from
amitkparekh:load-vocab-from-pretrained-and-instances
Aug 24, 2021
Merged
Changes from 5 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
f97b238
Add Vocabulary constructor from both pretrained transformers and inst…
amitkparekh 751fcd9
Merge branch 'main' into load-vocab-from-pretrained-and-instances
amitkparekh 32bd0de
Merge branch 'main' into load-vocab-from-pretrained-and-instances
amitkparekh 9f2eb94
undo autoformatting on changelog (sorry!)
amitkparekh 8278894
update changelog without autoformatting everything
amitkparekh f783c45
Merge branch 'main' into load-vocab-from-pretrained-and-instances
amitkparekh 97eacbf
Remove allowing multiple pretrained transformers to a single namespace
amitkparekh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is supposed to happen when you put two transformers into the same namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two models are put into the same namespace, that namespace is extended by the tokens in both models. I don't know why someone might want to do it, but there might be a research reason for it?
This is tested with both
test_with_single_namespace_and_multiple_models
andtest_with_multiple_models_across_multiple_namespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the result will be wrong if you do that. Each transformer expects a word piece to map to a certain integer. If a word piece maps to a different integer, the embeddings won't work. You'll probably get an "index out of bounds" exception (if you're lucky). Since we can't map two word pieces to the same integer (and we certainly can't map the same word piece to two different integers), I think we have to disallow taking in two transformer vocabs into the same namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me! I've updated the code to reflect those changes.