Spans created by detect_multiple_languages_of
sometimes skip the last characters
#247
Labels
bug
Something isn't working
Milestone
Uh oh!
There was an error while loading. Please reload this page.
Hi !
We have a sentence with several languages but also arbitrary characters (such as numbers, etc.).
When these characters are at the end of the text, the spans returned may not take this part into account. Part of the text is therefore lost.
In my case, it created a miss match between my "original document" and the document with lang.
Here an example to reproduce :
So my questions is : Is it the expected behavior or a bug ?
I expected the ‘lost’ text to be detected as part of the last span.
on my example : DetectionResult(start_index=91, end_index=155, word_count=6, language=Language.FRENCH)
The text was updated successfully, but these errors were encountered: