-
Notifications
You must be signed in to change notification settings - Fork 38
Resolves ambiguous text parsing at EOF #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* `parse_symbol` can now match text symbol IDs (e.g. `$23`) * Symbols read from the stream are now represented as `OwnedSymbolToken`s instead of `String`s to allow for the case in which a given symbol's ID was found and the text has not yet been looked up. * The reader's `next()` method now returns an (annotations, item) tuple instead of just an item. If there are no annotations on an item, the annotations Vec is empty.
This PR modifies the text reader to be able to unambiguously parse data found at the end of the stream. Because the parser always operates on a fixed buffer, it could not easily distinguish between the end of the buffer (more data to come) and the end of the stream being loaded into the buffer (EOF). When EOF is detected, the reader will now append a sentinel value to the end of the stream and re-attempt parsing. If the sentinel value is found, EndOfStream is reported. If a different value is found, that value is returned instead and the sentinel is discarded. Fixes #318. See that issue for more details.
Codecov Report
@@ Coverage Diff @@
## main #320 +/- ##
==========================================
- Coverage 91.53% 91.50% -0.03%
==========================================
Files 62 62
Lines 9282 9314 +32
==========================================
+ Hits 8496 8523 +27
- Misses 786 791 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
It occurs to me to ask though- #318 proposes a text IVM as the sentinel value but this PR uses |
Good eye! Two reasons:
|
This PR modifies the text reader to be able to unambiguously
parse data found at the end of the stream. Because the
parser always operates on a fixed buffer, it could not easily
distinguish between the end of the buffer (more data to come)
and the end of the stream being loaded into the buffer (EOF).
When EOF is detected, the reader will now append a sentinel
value to the end of the stream and re-attempt parsing. If the
sentinel value is found,
EndOfStream
is reported. If a differentvalue is found, that value is returned instead and the sentinel
is discarded.
Fixes #318. See that issue for more details.
This PR is based on the
read-text-annotations
branch, used in PR #319. The diff below is with that branch. Once #319 is merged, I'll rebase this withmain
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.