Skip to content

[syntax-errors] Async comprehension in sync comprehension #17177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Apr 8, 2025

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Apr 3, 2025

Summary

Detect async comprehensions nested in sync comprehensions in async functions before Python 3.11, when this was changed.

The actual logic of this rule is very straightforward, but properly tracking the async scopes took a bit of work. An alternative to the current approach is to offload the in_async_context check into the SemanticSyntaxContext trait, but that actually required much more extensive changes to the TestContext and also to ruff's semantic model, as you can see in the changes up to 31554b4. This version has the benefit of mostly centralizing the state tracking in SemanticSyntaxChecker, although there was some subtlety around deferred function body traversal that made the changes to Checker more intrusive too (hence the new linter test).

The Checkpoint struct/system is obviously overkill for now since it's only tracking a single bool, but I thought it might be more useful later.

Test Plan

New inline tests and a new linter integration test.

ntBre added 20 commits April 3, 2025 10:09
Summary
--

Detect async comprehensions nested in sync comprehensions in async functions
before Python 3.11, when this was [changed].

[changed]: python/cpython#77527

Test Plan
--

New inline tests. I also inlined `TestContext` into the
`SemanticSyntaxCheckerVisitor` to make it possible to modify the context as it
traverses the AST without adding mutating methods to the `SemanticSyntaxContext`
trait. I also added a new linter integration test to make sure its
`in_async_context` method has the behavior I expected.
but the checkpoint isn't wired up in the ruff_python_ast checker, so it seems
it's not needed?
I think I've convinced myself that this error can be detected by a single level
of nesting, which is in line with not needing this checkpoint. the issue is in a
structure like this:

sync
  async

and it doesn't matter if that occurs at the top level or somewhere down the
stack like

async
  async
    sync
	  async

it also doesn't matter that a pattern like this

sync
  async
    async

prevents the innermost comprehension from raising an error because we will have
already reported an error for its parent
This reverts commit 375e4e1.
@ntBre ntBre added rule Implementing or modifying a lint rule preview Related to preview mode features labels Apr 3, 2025
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would simplify the implementation if we only tracked whether we're in a sync comprehension rather than tracking if we're in an async context? That seems easier than correctly tracking if we're also in an async function. It has the downside that we might emit one extra diagnostic if someone uses an async comprehension in a sync function, but I'd expect this to be rare. On the other hand, it doesn't add that much complexity. It's just a bit tricky to get the handling right when to unset the async flag. Do you know if it's necessary to unset the flag in a sync context manager (in an async function)?

// check for errors
self.check_stmt(stmt, ctx);

self.save_checkpoint();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store the checkpoint for every statement or could we only store and restore it for some statements (e.g functions, lambdas, classes?)

/// because the parent context is what matters.
in_async_context: bool,

checkpoint: Vec<Checkpoint>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could avoid having to store the checkpoints inside and instead move it to the stack by changing SemanticSyntaxChecker::enter_stmt to return a state which the caller needs to pass to exit_stmt (consumes the state, the state can't be created externally).

The checker would always store the current state and enter_stmt would return the previous state from enter_stmt (but update its internal state). The exit_stmt call (or expression) then restores the internal state.

The downside of this is that it requires some more bookkeeping at the call site, but it seems fairly reasonable.

This approach would mirror the Checker's approach with updating SemanticModelFlags, where it updates the flags before walking the children and then restores the flags once it's done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to keep the stack solution, we should then change the implementation to only snapshot and restore when necessary. For example, it isn't necessary to store the snapshot for most expressions because the state won't be changed (and, therefore, there isn't anything to restore either).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, it really wasn't bad to plumb this through at the call sites.

Stmt::FunctionDef(ast::StmtFunctionDef { is_async, .. }) => {
self.in_async_context = *is_async;
self.seen_futures_boundary = true;
}
_ => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set in_async_context to false when entering a Class, lambda, or a sync with context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch. It looks like class and lambda do cause problems even inside of an outer async function, but with doesn't seem to add a sync scope:

Python 3.13.2 (main, Feb  5 2025, 08:05:21) [GCC 14.2.1 20250128] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> async def foo(): lambda: [_ async for n in range(3)]
  File "<python-input-4>", line 1
    async def foo(): lambda: [_ async for n in range(3)]
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: asynchronous comprehension outside of an asynchronous function
>>> async def foo():
...     class C: [_ async for n in range(3)]
...
  File "<python-input-7>", line 2
    class C: [_ async for n in range(3)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: asynchronous comprehension outside of an asynchronous function
>>> async def foo():
...     with open("foo.txt", "w") as f:
...         [_ async for n in range(3)]
...
>>>

However, these are both still errors on 3.13, so this might be a separate, version-independent rule.

This is making me think we should go back to the trait implementation and defer to the full visitor's scope-based implementation. Or at least try tracking nested scopes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a lint rule that detects these, PLE1142, but not an error.

https://play.ruff.rs/f0be27e4-9cb1-4bdb-9ebe-b095d81815e5

@ntBre
Copy link
Contributor Author

ntBre commented Apr 3, 2025

Do you think it would simplify the implementation if we only tracked whether we're in a sync comprehension rather than tracking if we're in an async context? That seems easier than correctly tracking if we're also in an async function. It has the downside that we might emit one extra diagnostic if someone uses an async comprehension in a sync function, but I'd expect this to be rare. On the other hand, it doesn't add that much complexity. It's just a bit tricky to get the handling right when to unset the async flag. Do you know if it's necessary to unset the flag in a sync context manager (in an async function)?

This sounded very promising at first, but I think we would still need the checkpoints, and it would only remove the function-specific code, which I don't think adds that much "complexity" per se, like you said. Although I guess we could avoid statement checkpoints and exit_stmt entirely since everything else is an expression.

On the other hand, it sounds like PLE1142 should also be a syntax error and needs even more careful scope tracking, so it might be worth setting up for that here.

I'll work on your suggestions about removing the checkpoint stack and see how it looks after that.

As I said in a reply above, we could also give the trait approach another try. That just gets annoying for inline testing. It requires a lot of new test code, and that test code doesn't really guarantee anything about the actual users. However, that's also somewhat the case here because of the tricky enter/exit calls in Checker.

Copy link
Contributor

github-actions bot commented Apr 3, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@ntBre
Copy link
Contributor Author

ntBre commented Apr 3, 2025

The ecosystem check is showing a false positive in a notebook cell, which I think should have an implicit async scope. It's probably also picking up a default python version since it mentions 3.9.

Comment on lines 21 to +24
#[derive(Debug)]
pub struct Checkpoint {
in_async_context: bool,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially include a DebugDropBomb to make sure that the consumer of the enter API don't forgot to call the exit_stmt / exit_expr methods which will defuse the bomb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I think I saw some mention of this around the new diagnostics. I could also mark enter_stmt as #[must_use], which could help a bit in the same direction.

Comment on lines +600 to +606
// For functions, defer semantic syntax error checks until the body of the function is
// visited
let checkpoint = if stmt.is_function_def_stmt() {
None
} else {
Some(self.with_semantic_checker(|semantic, context| semantic.enter_stmt(stmt, context)))
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reasoning for skipping function definitions here. Won't this result in stale in_async_context flags because we don't update the state until after we visited the entire body?

If there's a need for deferred visiting, then I'd prefer to have a enter_deferred_stmt or, better, move the necessary checks into exit_stmt and also pass the statement and context because exit is exactly the hook called after visiting the node's children

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be missing something here, but my reasoning for this was that the ast::Checker doesn't actually visit the body of the function until visit_deferred_function:

self.visit_parameters(parameters);
// Set the docstring state before visiting the function body.
self.docstring_state = DocstringState::Expected(ExpectedDocstringKind::Function);
self.visit_body(body);

Before I added this logic, the function body was being visited twice by the SemanticSyntaxChecker but only once by the ast::Checker and this integration test was failing even though the inline version passed:

async def test(): return [[x async for x in elements(n)] async for n in range(3)]

which pointed to an issue in the visit order because the test visitor is much simpler:

    fn visit_stmt(&mut self, stmt: &'_ Stmt) {
        let checkpoint = self.checker.enter_stmt(stmt, &self.context);
        ruff_python_ast::visitor::walk_stmt(self, stmt);
        self.checker.exit_stmt(checkpoint);
    }

Again I might be misunderstanding, but I don't think an enter_deferred_stmt helps here because we still need this logic in visit_stmt itself to avoid duplicating the visit. Unless you mean hiding this logic inside of SemanticSyntaxChecker::enter_stmt. I guess that could work if I also update the test visitor to defer function bodies like the real ast::Checker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. This seems fragile. I wouldn't be aware of this out-of-order visiting when working on the SemanticSyntaxChecker. We should at least document the constraints in which the enter methods are called. I assumed it would be in semantic visiting order but it seems its in semantic visiting order except for functions which may be deferred (or not, depending on the caller)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, very fragile. I was so confused when the integration test was failing but the inline test was fine, until I realized the function bodies were deferred. I'll work on expanding the docs for this.

@ntBre
Copy link
Contributor Author

ntBre commented Apr 7, 2025

I made a few improvements here:

  • documented when enter_stmt should be called. I didn't add anything for enter_expr because it doesn't have any of these fragile cases yet
  • added #[must_use] to both enter_ methods to require using the returned Checkpoint. We could add some kind of drop bomb if we want to be even more sure
  • add a PySourceType argument to SemanticSyntaxChecker::new to avoid notebook false positives (the top level scope should allow async code in notebooks)

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, might want to wait on @MichaReiser as he has the most context for this one.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. It would be great if we could move some of the CLI tests to parser tests (maybe not the jupyter one because that would be very noisy unless the out-of-bound parser test have options support)

@@ -5564,3 +5564,206 @@ fn semantic_syntax_errors() -> Result<()> {

Ok(())
}

#[test]
fn async_comprehension_in_sync_comprehension() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quiet clear to me why we need those cli tests. Could some of those tests be parser tests instead? The parser tests already support # parse_options: {"target-version": "3.7"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need all of these, but the third and fourth helped to detect bugs in the enter/exit pairs and the deferred function body traversal. The first two are probably less valuable, if you want me to trim it down a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could narrow them down and maybe document why it's important that they're CLI tests.

An alternative would be to implement them as custom tests in ruff_linter (that instantiate checker). Maybe there's something you can reuse from the linter test infra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the pyflakes test runner (as suggested on Discord) and moved all of the CLI tests to ruff_linter. They're just in the top-level linter::tests module for now, but I'm happy to move them around if there's a better place. I also considered nesting them under rules/syntax_errors, for example.

Copy link
Member

@MichaReiser MichaReiser Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having them in linter::tests is fine. They are integration tests after all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks! I'll merge then

@ntBre ntBre merged commit 058439d into main Apr 8, 2025
22 checks passed
@ntBre ntBre deleted the brent/syn-async-comprehensions branch April 8, 2025 16:50
dcreager added a commit that referenced this pull request Apr 9, 2025
* origin/main:
  [red-knot] Default `python-platform` to current platform (#17183)
  [red-knot] Add new 'unreachable code' test case (#17306)
  [red-knot] mypy_primer: Run on `async-utils` (#17303)
  [red-knot] Add custom `__setattr__` support (#16748)
  [red-knot] Add `__init__` arguments check when doing `try_call` on a class literal (#16512)
  [`flake8-pie`] Avoid false positive for multiple assignment with `auto()` (`PIE796`) (#17274)
  [syntax-errors] Async comprehension in sync comprehension (#17177)
  [`airflow`] Expand module path check to individual symbols (`AIR302`) (#17278)
  [syntax-errors] Check annotations in annotated assignments (#17283)
  [syntax-errors] Extend annotation checks to `await` (#17282)
  [red-knot] Add support for `assert_never` (#17287)
  [`flake8-pytest-style`] Avoid false positive for legacy form of `pytest.raises` (`PT011`) (#17231)
  [red-knot] Do not show types for literal expressions on hover (#17290)
  [red-knot] Fix dead-code clippy warning (#17291)
  [red-knot] Reachability analysis (#17199)
  [red-knot] Don't use latency-sensitive for handlers (#17227)
dcreager added a commit that referenced this pull request Apr 9, 2025
* dcreager/special-class: (26 commits)
  lint
  Add TODO about property test data
  Better todos
  Narrow type(generic) better
  More Python-like displays for specializations
  Add xfail for generic method inside generic class
  Comment other non-specializations
  Explain self_instance not being specialized
  Generic aliases are literals in type display
  Better TODO fallback type
  [red-knot] Default `python-platform` to current platform (#17183)
  [red-knot] Add new 'unreachable code' test case (#17306)
  [red-knot] mypy_primer: Run on `async-utils` (#17303)
  [red-knot] Add custom `__setattr__` support (#16748)
  [red-knot] Add `__init__` arguments check when doing `try_call` on a class literal (#16512)
  [`flake8-pie`] Avoid false positive for multiple assignment with `auto()` (`PIE796`) (#17274)
  [syntax-errors] Async comprehension in sync comprehension (#17177)
  [`airflow`] Expand module path check to individual symbols (`AIR302`) (#17278)
  [syntax-errors] Check annotations in annotated assignments (#17283)
  [syntax-errors] Extend annotation checks to `await` (#17282)
  ...
Glyphack pushed a commit to Glyphack/ruff that referenced this pull request Apr 9, 2025
…17177)

Summary
--

Detect async comprehensions nested in sync comprehensions in async
functions before Python 3.11, when this was [changed].

The actual logic of this rule is very straightforward, but properly
tracking the async scopes took a bit of work. An alternative to the
current approach is to offload the `in_async_context` check into the
`SemanticSyntaxContext` trait, but that actually required much more
extensive changes to the `TestContext` and also to ruff's semantic
model, as you can see in the changes up to
31554b4. This version has the benefit
of mostly centralizing the state tracking in `SemanticSyntaxChecker`,
although there was some subtlety around deferred function body traversal
that made the changes to `Checker` more intrusive too (hence the new
linter test).

The `Checkpoint` struct/system is obviously overkill for now since it's
only tracking a single `bool`, but I thought it might be more useful
later.

[changed]: python/cpython#77527

Test Plan
--

New inline tests and a new linter integration test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants