Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Commit 5a07009

Browse files
authored
Fix RoBERTa SST (#4548)
* Fix RobertaSST * Fix unrelated formatting issue * Changelog * Be slightly more flexible about tokens
1 parent 351941f commit 5a07009

File tree

4 files changed

+12
-5
lines changed

4 files changed

+12
-5
lines changed

CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1717
Also, when `max_length` was set to a non-`None` value, several warnings would appear
1818
for certain transformer models around the use of the `truncation` parameter.
1919
- Fixed evaluation of all metrics when using distributed training.
20+
- Fixed problem with automatically detecting whether tokenization is necessary.
21+
This affected primarily the Roberta SST model.
22+
2023

2124
## [v1.1.0rc2](https://github.com/allenai/allennlp/releases/tag/v1.1.0rc2) - 2020-07-31
2225

allennlp/common/tqdm.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
global defaults for certain tqdm parameters.
44
"""
55
import logging
6+
from allennlp.common import logging as common_logging
67
import sys
78
from time import time
89
from typing import Optional
@@ -17,8 +18,6 @@
1718
else:
1819
from tqdm import tqdm as _tqdm
1920

20-
from allennlp.common import logging as common_logging
21-
2221

2322
# This is necessary to stop tqdm from hanging
2423
# when exceptions are raised inside iterators.

allennlp/data/tokenizers/token.py

+3
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,9 @@ def __init__(
8181
text_id: int = None,
8282
type_id: int = None,
8383
) -> None:
84+
assert text is None or isinstance(
85+
text, str
86+
) # Some very hard to debug errors happen when this is not true.
8487
self.text = text
8588
self.idx = idx
8689
self.idx_end = idx_end

allennlp/predictors/text_classifier.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,11 @@ def _json_to_instance(self, json_dict: JsonDict) -> Instance:
3030
Runs the underlying model, and adds the `"label"` to the output.
3131
"""
3232
sentence = json_dict["sentence"]
33-
if not hasattr(self._dataset_reader, "tokenizer") and not hasattr(
34-
self._dataset_reader, "_tokenizer"
35-
):
33+
reader_has_tokenizer = (
34+
getattr(self._dataset_reader, "tokenizer", None) is not None
35+
or getattr(self._dataset_reader, "_tokenizer", None) is not None
36+
)
37+
if not reader_has_tokenizer:
3638
tokenizer = SpacyTokenizer()
3739
sentence = tokenizer.tokenize(sentence)
3840
return self._dataset_reader.text_to_instance(sentence)

0 commit comments

Comments
 (0)