Skip to content

Simplify IPv6 parsing logic a little #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2025
Merged

Conversation

jchadwick-buf
Copy link
Member

In order to work around some edge cases with IPv6, I adopted a local state value that would ensure productions could only appear in specific states. However, the same problem is solved a bit more elegantly in bufbuild/protovalidate-go#215 by simply always ensuring that the state after successfully consuming a production can't be invalid. This means that we need to handle the following edge cases:

  • Before, we check that the dotted IP follows a separator or double colon; now we just check that the dotted IP production does not begin on a colon. This works because as long as there is no colon at the beginning, if we pass the dotted check, the only possible valid production is a dotted IP, and further checks prevent us from invalidly reaching this state without a separator (because the hexadecatet production would eat into the dotted production if there was no separator.)
  • Before, we check that a hexadecatet doesn't follow another hexadecatet; this was not necessary since parseHexadecimalHexadecatet already ensures that the production is not overly long and fails if it is.
  • Before, we check that double colons don't follow a separator. This was not necessary because double colons are already consumed at a higher priority, so this state is impossible to reach.
  • Before, we check that a colon only follows a hexadecatet. We can now rule this out by ensuring that a colon doesn't follow a double colon, and that a single colon is only encountered after at least one hexadecatet and never at the end of the string. This winds up being effectively equivalent as those are the only two cases you can possibly still reach this production that are invalid.
  • Before, we check that the parser ends on either a double colon or a hexadecatet. This rules out ending on an empty string, which is not necessary: that would imply zero hexadecatets were parsed, which will return false anyways shortly after. This also rules out ending on a colon, which we can enforce by ensuring that the input string is not empty before continuing.

So I believe this is at least as correct as it was before, but is simpler.

In order to work around some edge cases with IPv6, I adopted a local state value that would ensure productions could only appear in specific states. However, the same problem is solved a bit more elegantly in bufbuild/protovalidate-go#215 by simply always ensuring that the state after successfully consuming a production can't be invalid. This means that we need to handle the following edge cases:
- Before, we check that the dotted IP follows a separator or double colon; now we just check that the dotted IP production does not begin on a colon.
- Before, we check that a hexadecatet doesn't follow another hexadecatet; this was not necessary since `parseHexadecimalHexadecatet` already ensures that the production is not overly long and fails if it is.
- Before, we check that double colons don't follow a separator. This was not necessary because double colons are already consumed at a higher priority, so this state is impossible to reach.
- Before, we check that a colon only follows a hexadecatet. We can now rule this out by ensuring that a colon doesn't follow a double colon, and that a single colon is only encountered after at least one hexadecatet and never at the end of the string. This winds up being effectively equivalent as those are the only two cases you can possibly still reach this production that are invalid.
- Before, we check that the parser ends on either a double colon or a hexadecatet. This rules out ending on an empty string, which is not necessary: that would imply zero hexadecatets were parsed, which will return false anyways shortly after. This also rules out ending on a colon, which we can enforce by ensuring that the input string is not empty before continuing.

So I believe this is *at least* as correct as it was before, but is simpler.
@jchadwick-buf jchadwick-buf requested a review from smaye81 April 8, 2025 18:26
@jchadwick-buf jchadwick-buf merged commit 70ab4a3 into main Apr 8, 2025
9 checks passed
@jchadwick-buf jchadwick-buf deleted the jchadwick/simpler-ipv6 branch April 8, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants