[syntax-errors] Start detecting compile-time syntax errors #16106

ntBre · 2025-02-11T21:33:48Z

Summary

This PR implements the "greeter" approach for checking the AST for syntax errors emitted by the CPython compiler. It introduces two main infrastructural changes to support all of the compile-time errors:

Adds a new semantic_errors module to the parser crate with public SemanticSyntaxChecker and SemanticSyntaxError types
Embeds a SemanticSyntaxChecker in the ruff_linter::Checker for checking these errors in ruff

As a proof of concept, it also implements detection of two syntax errors:

A reimplementation of late-future-import (F404)
Detection of rebound comprehension iteration variables (No syntax error for [(a := ...) for a in b] #14395)

Test plan

Existing F404 tests, new inline tests in the ruff_python_parser crate, and a linter CLI test showing an example of the Message output.

I also tested in VS Code, where preview = false and turning off syntax errors both disable the new errors:

And on the playground, where preview = false also disables the errors:

Fixes #14395

MichaReiser

I haven't done an in depth review (as this is also a draft PR) but I left a few comments that hopefully help clarify things.

Is the idea to emit version specific syntax errors (not semantic syntax errors) as part of the SyntaxChecker or do you still plan on emitting those as part of the parser?

I think we have to explore some alternative designs to decide on whether we want a rule code but it should definetely not block you from prototyping.

MichaReiser · 2025-02-11T21:36:25Z

crates/red_knot_python_semantic/src/types.rs

    let _span = tracing::trace_span!("check_types", file=?file.path(db)).entered();

    tracing::debug!("Checking file '{path}'", path = file.path(db));

    let index = semantic_index(db, file);
    let mut diagnostics = TypeCheckDiagnostics::default();
+    let mut syntax_diagnostics = SyntaxDiagnostics::default();


I don't think we should use another collection here. We should either push directly into TypeCheckDiagnostics or, what I'd prefer, move this to the semantic index phase.

Ahh okay, I knew something felt wrong about where I was putting this in red-knot. I will try moving it to the semantic index phase!

crates/ruff_python_syntax_errors/src/lib.rs

crates/ruff_linter/src/settings/types.rs

ntBre · 2025-02-11T22:20:32Z

I haven't done an in depth review (as this is also a draft PR) but I left a few comments that hopefully help clarify things.

Thanks Micha! Our conversation earlier was very helpful, and I think these comments clear up some places where I was still confused.

Is the idea to emit version specific syntax errors (not semantic syntax errors) as part of the SyntaxChecker or do you still plan on emitting those as part of the parser?

In this draft, I tried emitting all of the errors, including version specific ones, in the SyntaxChecker, but I think your suggestion of keeping those in the parser still makes sense. Was your idea to keep the version-specific errors in the parser and use this enter_stmt approach just for the semantic errors?

Detecting LateFutureImport felt a bit easier here than in the parser version, so I can see a place for both. On the other hand, I thought it might be nice to isolate all of these checks in the SyntaxChecker instead of mixing some into the parser, but some of the errors might only be detectable in the parser anyway.

I think we have to explore some alternative designs to decide on whether we want a rule code but it should definetely not block you from prototyping.

I partially updated the design doc with your rule code suggestions from earlier, but I still need to add a more concrete proposal to help with this discussion. I'll do that tonight or first thing in the morning.

ntBre · 2025-02-11T22:43:02Z

I haven't handled all of the review comments yet, but I rebased to get the new Diagnostic changes and hopefully fix the CI failures. I'm hoping this will be faster on codspeed than the other two prototypes too, or at least faster than the first one for sure.

github-actions · 2025-02-11T22:51:12Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser · 2025-02-12T08:14:06Z

In this draft, I tried emitting all of the errors, including version specific ones, in the SyntaxChecker, but I think your suggestion of keeping those in the parser still makes sense. Was your idea to keep the version-specific errors in the parser and use this enter_stmt approach just for the semantic errors?

Exactly. My motivation for this is that:

The parser already emits syntax errors today. So it feels seems like a good fit to emit the version specific syntax (not compile) errors in the parser too.
Our parser is used by other projects and I think it would be useful for them too if they can enforce a specific python version during parser without having to depend on the syntax-checker crate (which we'll likely remove if we unify Ruff/Red Knot)
I would love if the formatter tests could assert that there are no syntax errors in the source and formatted file. Doing so is much easier if this is a capability of the formatter than compared to calling into some visitor.
I'm sort of okay with not documenting version specific syntax errors. I do think we may want to have some documentation for compile errors, at least long term.

AlexWaygood · 2025-02-12T11:59:06Z

I'm sort of okay with not documenting version specific syntax errors. I do think we may want to have some documentation for compile errors, at least long term.

For most of these, I think we could get away without documentation. There's not that much to say about the match statement not being valid before Python 3.10, for example.

For some of them, though, I think docs would be really useful for our users. For example, the syntax differences between Python 3.8 and 3.9 due to PEP 614 are pretty subtle. And the details on when exactly you're able to parenthesize context managers in with statements on Python 3.8 are also pretty complicated -- I can never remember what they are!

This is invalid syntax on Python 3.8:

with (
    foo() as x,
    bar() as y,
):
    pass

But these are all valid syntax:

Python 3.8.18 (default, Feb 15 2024, 19:36:58) 
[Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> with (x, y) as foo:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with (x,
...     y) as foo:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with (x,
... y):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with (
...     x,
... ):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
NameError: name 'x' is not defined
>>> with (
...     x,
...     y
... ) as foo:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
NameError: name 'x' is not defined
>>> with x, (
...     y
... ): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with x as foo, (
... y
... ) as bar:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> with x() as foo, (
...     y()
... ) as bar:
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

MichaReiser · 2025-02-12T12:11:18Z

For some of them, though, I think docs would be really useful for our users. For example, the syntax differences between Python 3.8 and 3.9 due to PEP 614 are pretty subtle. And the details on when exactly you're able to parenthesize context managers in with statements on Python 3.8 are also pretty complicated -- I can never remember what they are!

That's fair. But how is it different from any other syntax error. E.g what if you get a comprehension wrong?

Either way. Having them in the parser doesn't mean that we can never have unique error codes. Red Knot would allow for it. It might just not be possible today and I think that's fine.

AlexWaygood · 2025-02-12T12:19:01Z

That's fair. But how is it different from any other syntax error.

Well, as we discussed earlier, we also probably want version-dependent syntax errors to be suppressible. If an error is suppressible and has documentation, it ends up looking very much like a lint rule, no? ;)

And that's not to say that I want them detected using the same infrastructure as a lint rule. I think you're right that it'll be more performant to detect them in the parser where possible, and I think you make a great point when you say that it would be great for the formatter tests to be able to assert that they don't introduce any new version-dependent syntax errors. But I think we should be aware that these in quite a few ways are going to end up looking a lot more like our existing linter rules than our existing syntax rules.

E.g what if you get a comprehension wrong?

Sure, it might be nice to have better docs for our existing syntax errors too ;-) but I do think it's especially confusing and subtle for users if whether or not they get a syntax error depends on the target Python version we've inferred for their project. And documenting clearly which Python version the new syntax was added in could really help users.

Either way. Having them in the parser doesn't mean that we can never have unique error codes. Red Knot would allow for it. It might just not be possible today and I think that's fine.

I don't have a strong opinion on whether we should detect these errors in the parser or elsewhere. But if we are going to detect them in the parser, I do think we should have a plan for how we'll provide docs for these errors. It doesn't have to be part of the initial PR adding these syntax diagnostics, but it is important to me.

MichaReiser · 2025-02-12T12:31:04Z

Well, as we discussed earlier, we also probably want version-dependent syntax errors to be suppressible. If an error is suppressible and has documentation, it ends up looking very much like a lint rule, no? ;)

I don't think this is something we have an agreement on.

AlexWaygood · 2025-02-12T12:33:05Z

I don't think this is something we have an agreement on.

Oh, sorry about that. I think I misunderstood our earlier conversation on Monday in that case -- that's on me :-)

Summary -- WIP currently I just added all of the valid cases from Alex's comment here #16106 (comment) Test Plan -- Inline tests

) Summary -- I thought this was very complicated based on the comment here: #16106 (comment) and on some of the discussion in the CPython issue here: python/cpython#56991. However, after a little bit of experimentation, I think it boils down to this example: ```python with (x as y): ... ``` The issue is parentheses around a `with` item with an `optional_var`, as we (and [Python](https://docs.python.org/3/library/ast.html#ast.withitem)) call the trailing variable name (`y` in this case). It's not actually about line breaks after all, except that line breaks are allowed in parenthesized expressions, which explains the validity of cases like ```pycon >>> with ( ... x, ... y ... ) as foo: ... pass ... ``` even on Python 3.8. I followed [pyright]'s example again here on the diagnostic range (just the opening paren) and the wording of the error. Test Plan -- Inline tests [pyright]: https://pyright-play.net/?pythonVersion=3.7&strict=true&code=FAdwlgLgFgBAFAewA4FMB2cBEAzBCB0EAHhJgJQwCGAzjLgmQFwz6tA

Summary -- This PR updates `check_path` in the `ruff_linter` crate to return a `Vec<Message>` instead of a `Vec<Diagnostic>`. The main motivation for this is to make it easier to convert semantic syntax errors directly into `Message`s rather than `Diagnostic`s in #16106. However, this also has the benefit of keeping the preview check on unsupported syntax errors in `check_path`, as suggested in https://github.com/astral-sh/ruff/pull/16429/files#r1974748024. Test Plan -- Existing tests. I also tested the playground and server manually.

Summary -- This PR updates `check_path` in the `ruff_linter` crate to return a `Vec<Message>` instead of a `Vec<Diagnostic>`. The main motivation for this is to make it easier to convert semantic syntax errors directly into `Message`s rather than `Diagnostic`s in #16106. However, this also has the benefit of keeping the preview check on unsupported syntax errors in `check_path`, as suggested in #16429 (comment). All of the interesting changes are in the first commit. The second commit just renames variables like `diagnostics` to `messages`, and the third commit is a tiny import fix. I also updated the `ExpandedMessage::location` field name, which caused a few extra commits tidying up the playground code. I thought it was nicely symmetric with `end_location`, but I'm happy to revert that too. Test Plan -- Existing tests. I also tested the playground and server manually.

ntBre · 2025-03-20T22:14:06Z

Thanks @dhruvmanila! I did test ruff.showSyntaxErrors initially, but I have now also tried ruff.configuration. It properly seems to override the local ruff.toml and setting "preview": false there disables the new errors.

I also added a test for --statistics, which seems to be working properly for all of the syntax error kinds.

Otherwise, I think I've handled the other comments too. Thank you and @MichaReiser for the reviews! This push should get the updated ecosystem check as well, just in case.

crates/ruff_python_parser/tests/generate_inline_tests.rs

crates/ruff_python_parser/src/semantic_errors/mod.rs

MichaReiser

This looks good. A few nits around how to work around the context mutability problem.

crates/ruff_python_parser/src/semantic_errors/mod.rs

crates/ruff_linter/src/linter.rs

crates/ruff_python_parser/src/semantic_errors/mod.rs

Co-authored-by: Micha Reiser <[email protected]>

this also includes one very small update to an existing test to remove a semantic error from an otherwise valid case

MichaReiser reviewed Feb 11, 2025

View reviewed changes

ntBre force-pushed the brent/syntax-error-source-order branch from 3807ac5 to 6394094 Compare February 11, 2025 22:41

ntBre mentioned this pull request Feb 24, 2025

Start improving detection of syntax errors #16034

Closed

ntBre added a commit that referenced this pull request Mar 5, 2025

[syntax-errors] Parenthesized context managers before Python 3.9

1d2c4fa

Summary -- WIP currently I just added all of the valid cases from Alex's comment here #16106 (comment) Test Plan -- Inline tests

ntBre mentioned this pull request Mar 5, 2025

[syntax-errors] Parenthesized context managers before Python 3.9 #16523

Merged

ntBre force-pushed the brent/syntax-error-source-order branch from 8ea3e35 to 5bc9a44 Compare March 7, 2025 23:00

ntBre changed the title ~~Syntax errors prototype v3~~ [syntax-errors] Start detecting compile-time syntax errors Mar 17, 2025

ntBre mentioned this pull request Mar 18, 2025

[internal] Return Messages from check_path #16837

Merged

ntBre added 10 commits March 19, 2025 12:21

try SourceOrderVisitor and hack into red-knot

c66f37a

ignore syntax diagnostics in red-knot tests

825f971

integrate with ruff, accept snap changes

4e702dd

implement F404 too

877a108

just use SyntaxChecker::enter_stmt instead of SourceOrderVisitor

af9d9dd

update doc comment

6f90709

tidy enter_stmt

ab2db92

move red_knot_python_semantic::PythonVersion to ruff_db

c4a5cc7

add docs to VersionSyntaxError

34b5829

revert red-knot integration

6cc9198

ntBre added 11 commits March 20, 2025 11:41

rename enter_ methods to visit_

ffdb7cb

don't build a HashSet

16479d4

target_version -> python_version

ef11801

report semantic errors even when AST-based rules are not selected

4cfaad0

add context trait and pass to visit_stmt

f1b2222

move python version into context trait too

ae17231

move preview check to check_ast to avoid collecting at all

a363c6b

remove separate crate, put SemanticSyntaxChecker in the parser

d48afb7

expand inline tests to semantic_errors module

7ffa680

test --statistics for all three kinds of errors

c482a02

fix outdated docs links

88cbb9b

dhruvmanila reviewed Mar 21, 2025

View reviewed changes

crates/ruff_python_parser/tests/generate_inline_tests.rs Outdated Show resolved Hide resolved

crates/ruff_python_parser/src/semantic_errors/mod.rs Outdated Show resolved Hide resolved

MichaReiser approved these changes Mar 21, 2025

View reviewed changes

ntBre and others added 12 commits March 21, 2025 08:01

move interior mutability to Checker

5f7de2a

Co-authored-by: Micha Reiser <[email protected]>

update checker field name and add docs (also fix an old missing .)

e8d44f1

fix LinterResult docs

7171e20

reorder SemanticSyntaxChecker code to navigate more easily

077bd37

combine semantic error tests with existing inline tests

f484a85

this also includes one very small update to an existing test to remove a semantic error from an otherwise valid case

move inline tests just above add_error call

de0914b

delete getter/setter without refcell

3ccea07

restore docs from SemanticModel flag, delete todo

0b62bde

move semantic_errors to single-file module

452283a

delete outdated comment

b5f828b

update variable name to match LinterResult field

ab89b04

delete now-unused inline/semantic directory

79ddb2f

ntBre merged commit 2baaedd into main Mar 21, 2025
22 checks passed

ntBre deleted the brent/syntax-error-source-order branch March 21, 2025 18:45

ntBre mentioned this pull request Apr 1, 2025

[syntax-errors] Invalid syntax in annotations #17101

Merged

BrewTestBot mentioned this pull request Apr 3, 2025

ruff 0.11.3 Homebrew/homebrew-core#218052

Merged

[syntax-errors] Start detecting compile-time syntax errors #16106

[syntax-errors] Start detecting compile-time syntax errors #16106

Uh oh!

Conversation

ntBre commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

ntBre Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ntBre commented Feb 11, 2025

Uh oh!

ntBre commented Feb 11, 2025

Uh oh!

github-actions bot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

MichaReiser commented Feb 12, 2025

Uh oh!

AlexWaygood commented Feb 12, 2025

Uh oh!

MichaReiser commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser commented Feb 12, 2025

Uh oh!

AlexWaygood commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntBre commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntBre commented Feb 11, 2025 •

edited

Loading

github-actions bot commented Feb 11, 2025 •

edited

Loading

`ruff-ecosystem` results

MichaReiser commented Feb 12, 2025 •

edited

Loading

AlexWaygood commented Feb 12, 2025 •

edited

Loading

AlexWaygood commented Feb 12, 2025 •

edited

Loading