Skip to content

Python: A few more parser fixes #17822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 30, 2024
Merged

Python: A few more parser fixes #17822

merged 7 commits into from
Oct 30, 2024

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Oct 22, 2024

A mixed bag of parser fixes for issues seen on python/cpython. See the individual commits for more information.

Pull Request checklist

All query authors

Internal query authors only

  • Autofixes generated based on these changes are valid, only needed if this PR makes significant changes to .ql, .qll, or .qhelp files. See the documentation (internal access required).
  • Changes are validated at scale (internal access required).
  • Adding a new query? Consider also adding the query to autofix.

This is primarily useful for ensuring that errors where a node does not
have an appropriate context set in `python.tsg` actually have an effect
on the pass/fail status of the parser tests. Previously, these would
just be logged to stdout, but test could still succeed when there were
errors present.

Also fixes one of the logging lines in `tsg_parser.py` to be more
consistent with the others.
That is, the `*T` in `def foo(*args : *T): ...`.

This is apparently a piece of syntax we did not support correctly until
now.

In terms of the grammar, we simply add `list_splat` as a possible
alternative for `type` (which could previously only be an `expression`).
We also update `python.tsg` to not specify `expression` those places (as
the relevant stanzas will then not work for `list_splat`s).

This syntax is not supported by the old parser, hence we only add a new
parser test for it.
Turns out, `except*` is actually not a token on its own according to the
Python grammar. This means it's legal to write `except *foo: ...`, which
we previously would consider a syntax error.

To fix it, we simply break up the `except*` into two separate tokens.
Surprisingly, the new parser did not support these constructs (and the
relevant test was missing this case), so on files that required the new
parser we were unable to parse this construct.

To fix it, we add `list_pattern` (not to be confused with
`pattern_list`) as a `tree-sitter-python` node that results in a `List`
node in the AST.
A somewhat complicated solution that necessitated adding a new custom
function to `tsg-python`. See the comments in `python.tsg` for why this
was necessary.
@tausbn tausbn added the no-change-note-required This PR does not need a change note label Oct 23, 2024
@tausbn tausbn marked this pull request as ready for review October 23, 2024 16:22
@tausbn tausbn requested a review from a team as a code owner October 23, 2024 16:22
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing these up. I feel the pain regarding having to compensate for tree-sitters matching in Rust code...

I have two comments, but I am not sure if they are worth addressing right away:

  • The contract for the logger seems a little brittle. Ideally, we would have an end_of_test-signal that would reset the error count and then checking could be idempotent.

  • I think we recently needed an "either list splat or expression" (we would look for certain fields to capture this), can we now use type instead?

@tausbn
Copy link
Contributor Author

tausbn commented Oct 30, 2024

  • The contract for the logger seems a little brittle. Ideally, we would have an end_of_test-signal that would reset the error count and then checking could be idempotent.

You're probably right. The current setup is a bit awkward in that it uses the same logger for all of the tests. Ideally we would create a new one for each test. Seeing as we basically never modify these tests, however, I don't think it's a high priority. (Also, I'm not entirely clear about what kind of signalling you have in mind. I guess one way would be to implement it as a suitable context handler.)

  • I think we recently needed an "either list splat or expression" (we would look for certain fields to capture this), can we now use type instead?

I think what you may be remembering is our use of fields to distinguish between plain expressions vs. dict_splat and keyword_arguments. However, in that case we were looking for the absence of said fields.

That said, in principle we could use type as a synonym for the combination you mention, but to me the kind of the node also has semantically important information. I don't think it would make sense to call any "list splat or expression" a type.

@tausbn tausbn merged commit f75615b into main Oct 30, 2024
10 checks passed
@tausbn tausbn deleted the tausbn/python-more-parser-fixes branch October 30, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants