Python: A few more parser fixes #17822

tausbn · 2024-10-22T15:19:12Z

A mixed bag of parser fixes for issues seen on python/cpython. See the individual commits for more information.

Pull Request checklist

All query authors

A change note is added if necessary. See the documentation in this repository.
All new queries have appropriate .qhelp. See the documentation in this repository.
QL tests are added if necessary. See Testing custom queries in the GitHub documentation.
New and changed queries have correct query metadata. See the documentation in this repository.

Internal query authors only

Autofixes generated based on these changes are valid, only needed if this PR makes significant changes to .ql, .qll, or .qhelp files. See the documentation (internal access required).
Changes are validated at scale (internal access required).
Adding a new query? Consider also adding the query to autofix.

This is primarily useful for ensuring that errors where a node does not have an appropriate context set in `python.tsg` actually have an effect on the pass/fail status of the parser tests. Previously, these would just be logged to stdout, but test could still succeed when there were errors present. Also fixes one of the logging lines in `tsg_parser.py` to be more consistent with the others.

That is, the `*T` in `def foo(*args : *T): ...`. This is apparently a piece of syntax we did not support correctly until now. In terms of the grammar, we simply add `list_splat` as a possible alternative for `type` (which could previously only be an `expression`). We also update `python.tsg` to not specify `expression` those places (as the relevant stanzas will then not work for `list_splat`s). This syntax is not supported by the old parser, hence we only add a new parser test for it.

Turns out, `except*` is actually not a token on its own according to the Python grammar. This means it's legal to write `except *foo: ...`, which we previously would consider a syntax error. To fix it, we simply break up the `except*` into two separate tokens.

Surprisingly, the new parser did not support these constructs (and the relevant test was missing this case), so on files that required the new parser we were unable to parse this construct. To fix it, we add `list_pattern` (not to be confused with `pattern_list`) as a `tree-sitter-python` node that results in a `List` node in the AST.

A somewhat complicated solution that necessitated adding a new custom function to `tsg-python`. See the comments in `python.tsg` for why this was necessary.

yoff

Thanks for fixing these up. I feel the pain regarding having to compensate for tree-sitters matching in Rust code...

I have two comments, but I am not sure if they are worth addressing right away:

The contract for the logger seems a little brittle. Ideally, we would have an end_of_test-signal that would reset the error count and then checking could be idempotent.
I think we recently needed an "either list splat or expression" (we would look for certain fields to capture this), can we now use type instead?

tausbn · 2024-10-30T12:47:01Z

The contract for the logger seems a little brittle. Ideally, we would have an end_of_test-signal that would reset the error count and then checking could be idempotent.

You're probably right. The current setup is a bit awkward in that it uses the same logger for all of the tests. Ideally we would create a new one for each test. Seeing as we basically never modify these tests, however, I don't think it's a high priority. (Also, I'm not entirely clear about what kind of signalling you have in mind. I guess one way would be to implement it as a suitable context handler.)

I think we recently needed an "either list splat or expression" (we would look for certain fields to capture this), can we now use type instead?

I think what you may be remembering is our use of fields to distinguish between plain expressions vs. dict_splat and keyword_arguments. However, in that case we were looking for the absence of said fields.

That said, in principle we could use type as a synonym for the combination you mention, but to me the kind of the node also has semantically important information. I don't think it would make sense to call any "list splat or expression" a type.

tausbn added 3 commits October 22, 2024 15:11

Python: Regenerate parser files

7ceefb5

github-actions bot added the Python label Oct 22, 2024

tausbn added 4 commits October 22, 2024 15:39

Python: Regenerate parser files

89ea4b8

Python: Allow comments in comprehensions

5db601a

A somewhat complicated solution that necessitated adding a new custom function to `tsg-python`. See the comments in `python.tsg` for why this was necessary.

tausbn added the no-change-note-required This PR does not need a change note label Oct 23, 2024

tausbn marked this pull request as ready for review October 23, 2024 16:22

tausbn requested a review from a team as a code owner October 23, 2024 16:22

yoff approved these changes Oct 29, 2024

View reviewed changes

tausbn merged commit f75615b into main Oct 30, 2024
10 checks passed

tausbn deleted the tausbn/python-more-parser-fixes branch October 30, 2024 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: A few more parser fixes #17822

Python: A few more parser fixes #17822

Uh oh!

tausbn commented Oct 22, 2024 •

edited

Loading

Uh oh!

yoff left a comment

Uh oh!

tausbn commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

Python: A few more parser fixes #17822

Python: A few more parser fixes #17822

Uh oh!

Conversation

tausbn commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request checklist

All query authors

Internal query authors only

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

tausbn commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

tausbn commented Oct 22, 2024 •

edited

Loading