Skip to content

Language of the Evaluator's Tokenizer is set to English when asked for Dutch or French. #62

@amobular

Description

@amobular

The tokenizer of the Evaluator class, will be set to TokenizerEN when languages from ['nl', 'fr'] are chosen.

See the following code:

if language == 'nl':
from deidentify.tokenizer.tokenizer_ons import TokenizerOns
self.tokenizer = TokenizerOns(disable=('tagger', 'parser', 'ner'))
if language == 'fr':
from deidentify.tokenizer.tokenizer_fr import TokenizerFR
self.tokenizer = TokenizerFR(disable=('tagger', 'parser', 'ner'))
if language == 'de':
from deidentify.tokenizer.tokenizer_de import TokenizerDE
self.tokenizer = TokenizerDE(disable=('tagger', 'parser', 'ner'))
else:
from deidentify.tokenizer.tokenizer_en import TokenizerEN
self.tokenizer = TokenizerEN(disable=('tagger', 'parser', 'ner'))

Change the if statements to elif and it will work as intended.

Nice project btw :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions