Skip to content

[syntax-errors] Start detecting compile-time syntax errors #16106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Mar 21, 2025

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Feb 11, 2025

Summary

This PR implements the "greeter" approach for checking the AST for syntax errors emitted by the CPython compiler. It introduces two main infrastructural changes to support all of the compile-time errors:

  1. Adds a new semantic_errors module to the parser crate with public SemanticSyntaxChecker and SemanticSyntaxError types
  2. Embeds a SemanticSyntaxChecker in the ruff_linter::Checker for checking these errors in ruff

As a proof of concept, it also implements detection of two syntax errors:

  1. A reimplementation of late-future-import (F404)
  2. Detection of rebound comprehension iteration variables (No syntax error for [(a := ...) for a in b] #14395)

Test plan

Existing F404 tests, new inline tests in the ruff_python_parser crate, and a linter CLI test showing an example of the Message output.

I also tested in VS Code, where preview = false and turning off syntax errors both disable the new errors:

image

And on the playground, where preview = false also disables the errors:

image

Fixes #14395

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done an in depth review (as this is also a draft PR) but I left a few comments that hopefully help clarify things.

Is the idea to emit version specific syntax errors (not semantic syntax errors) as part of the SyntaxChecker or do you still plan on emitting those as part of the parser?

I think we have to explore some alternative designs to decide on whether we want a rule code but it should definetely not block you from prototyping.

let _span = tracing::trace_span!("check_types", file=?file.path(db)).entered();

tracing::debug!("Checking file '{path}'", path = file.path(db));

let index = semantic_index(db, file);
let mut diagnostics = TypeCheckDiagnostics::default();
let mut syntax_diagnostics = SyntaxDiagnostics::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use another collection here. We should either push directly into TypeCheckDiagnostics or, what I'd prefer, move this to the semantic index phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, I knew something felt wrong about where I was putting this in red-knot. I will try moving it to the semantic index phase!

@ntBre
Copy link
Contributor Author

ntBre commented Feb 11, 2025

I haven't done an in depth review (as this is also a draft PR) but I left a few comments that hopefully help clarify things.

Thanks Micha! Our conversation earlier was very helpful, and I think these comments clear up some places where I was still confused.

Is the idea to emit version specific syntax errors (not semantic syntax errors) as part of the SyntaxChecker or do you still plan on emitting those as part of the parser?

In this draft, I tried emitting all of the errors, including version specific ones, in the SyntaxChecker, but I think your suggestion of keeping those in the parser still makes sense. Was your idea to keep the version-specific errors in the parser and use this enter_stmt approach just for the semantic errors?

Detecting LateFutureImport felt a bit easier here than in the parser version, so I can see a place for both. On the other hand, I thought it might be nice to isolate all of these checks in the SyntaxChecker instead of mixing some into the parser, but some of the errors might only be detectable in the parser anyway.

I think we have to explore some alternative designs to decide on whether we want a rule code but it should definetely not block you from prototyping.

I partially updated the design doc with your rule code suggestions from earlier, but I still need to add a more concrete proposal to help with this discussion. I'll do that tonight or first thing in the morning.

@ntBre ntBre force-pushed the brent/syntax-error-source-order branch from 3807ac5 to 6394094 Compare February 11, 2025 22:41
@ntBre
Copy link
Contributor Author

ntBre commented Feb 11, 2025

I haven't handled all of the review comments yet, but I rebased to get the new Diagnostic changes and hopefully fix the CI failures. I'm hoping this will be faster on codspeed than the other two prototypes too, or at least faster than the first one for sure.

Copy link
Contributor

github-actions bot commented Feb 11, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser
Copy link
Member

In this draft, I tried emitting all of the errors, including version specific ones, in the SyntaxChecker, but I think your suggestion of keeping those in the parser still makes sense. Was your idea to keep the version-specific errors in the parser and use this enter_stmt approach just for the semantic errors?

Exactly. My motivation for this is that:

  • The parser already emits syntax errors today. So it feels seems like a good fit to emit the version specific syntax (not compile) errors in the parser too.
  • Our parser is used by other projects and I think it would be useful for them too if they can enforce a specific python version during parser without having to depend on the syntax-checker crate (which we'll likely remove if we unify Ruff/Red Knot)
  • I would love if the formatter tests could assert that there are no syntax errors in the source and formatted file. Doing so is much easier if this is a capability of the formatter than compared to calling into some visitor.
  • I'm sort of okay with not documenting version specific syntax errors. I do think we may want to have some documentation for compile errors, at least long term.

@AlexWaygood
Copy link
Member

I'm sort of okay with not documenting version specific syntax errors. I do think we may want to have some documentation for compile errors, at least long term.

For most of these, I think we could get away without documentation. There's not that much to say about the match statement not being valid before Python 3.10, for example.

For some of them, though, I think docs would be really useful for our users. For example, the syntax differences between Python 3.8 and 3.9 due to PEP 614 are pretty subtle. And the details on when exactly you're able to parenthesize context managers in with statements on Python 3.8 are also pretty complicated -- I can never remember what they are!

This is invalid syntax on Python 3.8:

with (
    foo() as x,
    bar() as y,
):
    pass

But these are all valid syntax:

Python 3.8.18 (default, Feb 15 2024, 19:36:58) 
[Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> with (x, y) as foo:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with (x,
...     y) as foo:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with (x,
... y):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with (
...     x,
... ):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
NameError: name 'x' is not defined
>>> with (
...     x,
...     y
... ) as foo:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
NameError: name 'x' is not defined
>>> with x, (
...     y
... ): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with x as foo, (
... y
... ) as bar:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with x() as foo, (
...     y()
... ) as bar:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

@MichaReiser
Copy link
Member

MichaReiser commented Feb 12, 2025

For some of them, though, I think docs would be really useful for our users. For example, the syntax differences between Python 3.8 and 3.9 due to PEP 614 are pretty subtle. And the details on when exactly you're able to parenthesize context managers in with statements on Python 3.8 are also pretty complicated -- I can never remember what they are!

That's fair. But how is it different from any other syntax error. E.g what if you get a comprehension wrong?

Either way. Having them in the parser doesn't mean that we can never have unique error codes. Red Knot would allow for it. It might just not be possible today and I think that's fine.

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 12, 2025

That's fair. But how is it different from any other syntax error.

Well, as we discussed earlier, we also probably want version-dependent syntax errors to be suppressible. If an error is suppressible and has documentation, it ends up looking very much like a lint rule, no? ;)

And that's not to say that I want them detected using the same infrastructure as a lint rule. I think you're right that it'll be more performant to detect them in the parser where possible, and I think you make a great point when you say that it would be great for the formatter tests to be able to assert that they don't introduce any new version-dependent syntax errors. But I think we should be aware that these in quite a few ways are going to end up looking a lot more like our existing linter rules than our existing syntax rules.

E.g what if you get a comprehension wrong?

Sure, it might be nice to have better docs for our existing syntax errors too ;-) but I do think it's especially confusing and subtle for users if whether or not they get a syntax error depends on the target Python version we've inferred for their project. And documenting clearly which Python version the new syntax was added in could really help users.

Either way. Having them in the parser doesn't mean that we can never have unique error codes. Red Knot would allow for it. It might just not be possible today and I think that's fine.

I don't have a strong opinion on whether we should detect these errors in the parser or elsewhere. But if we are going to detect them in the parser, I do think we should have a plan for how we'll provide docs for these errors. It doesn't have to be part of the initial PR adding these syntax diagnostics, but it is important to me.

@MichaReiser
Copy link
Member

Well, as we discussed earlier, we also probably want version-dependent syntax errors to be suppressible. If an error is suppressible and has documentation, it ends up looking very much like a lint rule, no? ;)

I don't think this is something we have an agreement on.

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 12, 2025

I don't think this is something we have an agreement on.

Oh, sorry about that. I think I misunderstood our earlier conversation on Monday in that case -- that's on me :-)

ntBre added a commit that referenced this pull request Mar 5, 2025
Summary
--

WIP currently I just added all of the valid cases from Alex's comment here
#16106 (comment)

Test Plan
--
Inline tests
@ntBre ntBre force-pushed the brent/syntax-error-source-order branch from 8ea3e35 to 5bc9a44 Compare March 7, 2025 23:00
ntBre added a commit that referenced this pull request Mar 17, 2025
)

Summary
--

I thought this was very complicated based on the comment here:
#16106 (comment) and
on some of the discussion in the CPython issue here:
python/cpython#56991. However, after a little
bit of experimentation, I think it boils down to this example:

```python
with (x as y): ...
```

The issue is parentheses around a `with` item with an `optional_var`, as
we (and
[Python](https://docs.python.org/3/library/ast.html#ast.withitem)) call
the trailing variable name (`y` in this case). It's not actually about
line breaks after all, except that line breaks are allowed in
parenthesized expressions, which explains the validity of cases like


```pycon
>>> with (
...     x,
...     y
... ) as foo:
...     pass
... 
```

even on Python 3.8.

I followed [pyright]'s example again here on the diagnostic range (just
the opening paren) and the wording of the error.


Test Plan
--
Inline tests

[pyright]:
https://pyright-play.net/?pythonVersion=3.7&strict=true&code=FAdwlgLgFgBAFAewA4FMB2cBEAzBCB0EAHhJgJQwCGAzjLgmQFwz6tA
@ntBre ntBre changed the title Syntax errors prototype v3 [syntax-errors] Start detecting compile-time syntax errors Mar 17, 2025
ntBre added a commit that referenced this pull request Mar 18, 2025
Summary
--

This PR updates `check_path` in the `ruff_linter` crate to return a
`Vec<Message>` instead of a `Vec<Diagnostic>`. The main motivation for this is
to make it easier to convert semantic syntax errors directly into `Message`s
rather than `Diagnostic`s in #16106. However, this also has the benefit of
keeping the preview check on unsupported syntax errors in `check_path`, as
suggested in https://github.com/astral-sh/ruff/pull/16429/files#r1974748024.

Test Plan
--

Existing tests. I also tested the playground and server manually.
ntBre added a commit that referenced this pull request Mar 19, 2025
Summary
--

This PR updates `check_path` in the `ruff_linter` crate to return a
`Vec<Message>` instead of a `Vec<Diagnostic>`. The main motivation for
this is to make it easier to convert semantic syntax errors directly
into `Message`s rather than `Diagnostic`s in #16106. However, this also
has the benefit of keeping the preview check on unsupported syntax
errors in `check_path`, as suggested in
#16429 (comment).

All of the interesting changes are in the first commit. The second
commit just renames variables like `diagnostics` to `messages`, and the
third commit is a tiny import fix.

I also updated the `ExpandedMessage::location` field name, which caused
a few extra commits tidying up the playground code. I thought it was
nicely symmetric with `end_location`, but I'm happy to revert that too.

Test Plan
--

Existing tests. I also tested the playground and server manually.
@ntBre
Copy link
Contributor Author

ntBre commented Mar 20, 2025

Thanks @dhruvmanila! I did test ruff.showSyntaxErrors initially, but I have now also tried ruff.configuration. It properly seems to override the local ruff.toml and setting "preview": false there disables the new errors.

I also added a test for --statistics, which seems to be working properly for all of the syntax error kinds.

Otherwise, I think I've handled the other comments too. Thank you and @MichaReiser for the reviews! This push should get the updated ecosystem check as well, just in case.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. A few nits around how to work around the context mutability problem.

@ntBre ntBre merged commit 2baaedd into main Mar 21, 2025
22 checks passed
@ntBre ntBre deleted the brent/syntax-error-source-order branch March 21, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No syntax error for [(a := ...) for a in b]
4 participants