Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Commit 43b384d

Browse files
WrRanmatt-gardner
authored andcommitted
Move some scripts to allennlp/allennlp/tools (#2584)
* Fix bug in uniform_unit_scaling #2239 (#2273) * Fix type annotation for .forward(...) in tutorial (#2122) * Add a Contributions section to README.md (#2277) * script for doing archive surgery (#2223) * script for doing archive surgery * simplify script * Fix spelling in tutorial README (#2283) * fix #2285 (#2286) * Update the `find-lr` subcommand help text. (#2289) * Update the elmo command help text. * Update the find-lr subcommand help text. * Add __repr__ to Vocabulary (#2293) As it currently stands, the following is logged during training: ``` 2019-01-06 10:46:21,832 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.model s.language_model.LanguageModel'> from params {'bidirectional': False, 'contextualizer': {'bidirectional': False, 'dropout': 0.5, 'hidden_size': 200, 'input_size': 200, 'num_layers': 2, 'type': 'lstm'}, 'dropout ': 0.5, 'text_field_embedder': {'token_embedders': {'tokens': {'embedding_dim': 200, 'type': 'embedding'} }}} and extras {'vocab': <allennlp.data.vocabulary.Vocabulary object at 0x7ff7811665f8>} ``` Note that the `Vocabulary` does not provide any useful information, since it doesn't have `__repr__` defined. This provides a fix. * Update the base image in the Dockerfiles. (#2298) * Don't deprecate bidirectional-language-model name (#2297) * bump version number to v0.8.1 * Bump version numbers to v0.8.2-unreleased * Turn BidirectionalLM into a more-general LanguageModel class (#2264) Fixes #2255 This PR replaces the `BidirectionalLM` class with a more-general `LanguageModel` that can be used in either the unidirectional/forward setting or the bidirectional setting. It also accordingly replaces the `BidirectionalLanguageModelTokenEmbedder` with a `LanguageModelTokenEmbedder`. Also fixes bug in the experiment_unsampled.jsonnet config that was preventing a test from actually being unsampled. TODO: - [x] test the unidirectional case - [x] properly deprecate `BidirectionalLM` and `BidirectionalLanguageModelTokenEmbedder` - [x] check docs for accuracy - [x] fix user-facing training configs * move some utilities from allennlp/scripts to allennlp/allennlp/tools * make pylint happy * add modules to API doc
1 parent fe80f9f commit 43b384d

File tree

5 files changed

+53
-36
lines changed

5 files changed

+53
-36
lines changed
File renamed without changes.

scripts/create_elmo_embeddings_from_vocab.py renamed to allennlp/tools/create_elmo_embeddings_from_vocab.py

+9-10
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,16 @@
11
# pylint: disable=no-self-use
2-
3-
import os
42
import argparse
5-
import sys
6-
7-
sys.path.insert(0, os.path.dirname(os.path.abspath(os.path.join(__file__, os.pardir))))
8-
93
import gzip
4+
import os
5+
106
import numpy
117
import torch
128

13-
from allennlp.data.token_indexers import ELMoTokenCharactersIndexer
14-
from allennlp.modules.elmo import _ElmoCharacterEncoder
9+
from allennlp.common.checks import ConfigurationError
1510
from allennlp.data import Token, Vocabulary
11+
from allennlp.data.token_indexers import ELMoTokenCharactersIndexer
1612
from allennlp.data.vocabulary import DEFAULT_OOV_TOKEN
17-
from allennlp.common.checks import ConfigurationError
13+
from allennlp.modules.elmo import _ElmoCharacterEncoder
1814

1915

2016
def main(vocab_path: str,
@@ -97,9 +93,12 @@ def main(vocab_path: str,
9793
for word in tokens:
9894
new_vocab_file.write(f"{word}\n")
9995

96+
10097
if __name__ == "__main__":
98+
# pylint: disable=invalid-name
10199
parser = argparse.ArgumentParser(description='Generate CNN representations for a vocabulary '
102-
'using ELMo')
100+
'using ELMo',
101+
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
103102
parser.add_argument('--vocab_path', type=str, help='A path to a vocabulary file to generate '
104103
'representations for.')
105104
parser.add_argument('--elmo_config', type=str, help='The path to a directory containing an '

allennlp/tools/inspect_cache.py

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import os
2+
3+
from allennlp.common.file_utils import CACHE_DIRECTORY
4+
from allennlp.common.file_utils import filename_to_url
5+
6+
7+
def main():
8+
print(f"Looking for datasets in {CACHE_DIRECTORY}...")
9+
if not os.path.exists(CACHE_DIRECTORY):
10+
print('Directory does not exist.')
11+
print('No cached datasets found.')
12+
13+
cached_files = os.listdir(CACHE_DIRECTORY)
14+
15+
if not cached_files:
16+
print('Directory is empty.')
17+
print('No cached datasets found.')
18+
19+
for filename in cached_files:
20+
if not filename.endswith("json"):
21+
url, etag = filename_to_url(filename)
22+
print('Filename: %s' % filename)
23+
print('Url: %s' % url)
24+
print('ETag: %s' % etag)
25+
print()
26+
27+
28+
if __name__ == '__main__':
29+
main()

doc/api/allennlp.tools.rst

+15
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,18 @@ tasks for which we build models.
2323
:members:
2424
:undoc-members:
2525
:show-inheritance:
26+
27+
.. automodule:: allennlp.tools.archive_surgery
28+
:members:
29+
:undoc-members:
30+
:show-inheritance:
31+
32+
.. automodule:: allennlp.tools.create_elmo_embeddings_from_vocab
33+
:members:
34+
:undoc-members:
35+
:show-inheritance:
36+
37+
.. automodule:: allennlp.tools.inspect_cache
38+
:members:
39+
:undoc-members:
40+
:show-inheritance:

scripts/inspect_cache.py

-26
This file was deleted.

0 commit comments

Comments
 (0)