Skip to content

How does mitie deal with the segmentation of OOV #205

Open
@rookiebird

Description

@rookiebird

Expected Behavior

Hi,I want to know how does mitie deal with the segmentation of OOV.
In fact, two of my train example like this:
1.The daily life of the [League Of Legends](name) on November 10 (chinese: [英雄联盟](name)11.10的日活)
2. The daily life of the [Tomb Raider3](name) on November 10 (chinese: [古墓丽影3](name)11.10的日活)
My training sample is in Chinese which contains many entities related to the game name. Some game names contain numbers, some have no numbers,like "古墓丽影3" and ”英雄联盟“.In the example above , I want mitie to identify the entities as "古墓丽影3" and the ”英雄联盟“. 11.10 is a simple representation of the date,which should not be include.

Current Behavior

I label the entity correctly.However, the first sample is often identified as ”英雄联盟11" rather than ”英雄联盟". How can I deal with this problem? I try to add several data,but It's work. Should I add more data ?

  • Version: 0.7.0
  • Where did you get MITIE: pip install
  • Platform: windows64 and linux64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions