Skip to content

Annotated sentences or whole text documents? #168

Open
@vasilikivmo

Description

@vasilikivmo

I have a large amount of texts which I want to test through the python examples. My question is: The annotated data I am going to provide, does it have to be split up into sentences? Do I have the ability to pass a whole text with multiple sentences, paragraphs and lines, and thus, the range numbers of each entity be based on the word counting of the whole document and not each sentence separately?

Based on the comment of the inner code (for example on train_ner.py):
"When you train a named_entity_extractor you need to get a dataset of sentences (or sentence or paragraph length chunks of text) where each sentence is annotated with the entities you want to find." I would like to make clearer the part on the paragraph length chunks of text.

Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions