Open
Description
The way that the Tokenizer
method uses the disect
method is fundamentally broken: disect
requires that "all indices superior to the one returned MUST validate the predicate as well" (source). This is not the case for the substring-based predicate in Tokenizer
.
Example: If the rules allow tokens of length one and tokens of length greater than two (/./
and /...+/
), the predicate will return false for index 2 and true for all other indices. Depending on the remaining length of the input, disect
will hit the index 2 or it won't. If it does, it finds a token of length three, if it doesn't it will find a token of length one.
So the parsing result depends on the length of the remaining input, which makes the parser behave highly erratic.
Metadata
Metadata
Assignees
Labels
No labels