-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[syntax-errors] Async comprehension in sync comprehension #17177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary -- Detect async comprehensions nested in sync comprehensions in async functions before Python 3.11, when this was [changed]. [changed]: python/cpython#77527 Test Plan -- New inline tests. I also inlined `TestContext` into the `SemanticSyntaxCheckerVisitor` to make it possible to modify the context as it traverses the AST without adding mutating methods to the `SemanticSyntaxContext` trait. I also added a new linter integration test to make sure its `in_async_context` method has the behavior I expected.
but the checkpoint isn't wired up in the ruff_python_ast checker, so it seems it's not needed?
I think I've convinced myself that this error can be detected by a single level of nesting, which is in line with not needing this checkpoint. the issue is in a structure like this: sync async and it doesn't matter if that occurs at the top level or somewhere down the stack like async async sync async it also doesn't matter that a pattern like this sync async async prevents the innermost comprehension from raising an error because we will have already reported an error for its parent
This reverts commit 375e4e1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would simplify the implementation if we only tracked whether we're in a sync comprehension rather than tracking if we're in an async context? That seems easier than correctly tracking if we're also in an async function. It has the downside that we might emit one extra diagnostic if someone uses an async comprehension in a sync function, but I'd expect this to be rare. On the other hand, it doesn't add that much complexity. It's just a bit tricky to get the handling right when to unset the async
flag. Do you know if it's necessary to unset the flag in a sync context manager (in an async function)?
// check for errors | ||
self.check_stmt(stmt, ctx); | ||
|
||
self.save_checkpoint(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to store the checkpoint
for every statement or could we only store and restore it for some statements (e.g functions, lambdas, classes?)
/// because the parent context is what matters. | ||
in_async_context: bool, | ||
|
||
checkpoint: Vec<Checkpoint>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could avoid having to store the checkpoints inside and instead move it to the stack by changing SemanticSyntaxChecker::enter_stmt
to return a state which the caller needs to pass to exit_stmt
(consumes the state, the state can't be created externally).
The checker would always store the current state and enter_stmt
would return the previous state from enter_stmt
(but update its internal state). The exit_stmt
call (or expression) then restores the internal state.
The downside of this is that it requires some more bookkeeping at the call site, but it seems fairly reasonable.
This approach would mirror the Checker's approach with updating SemanticModelFlags
, where it updates the flags before walking the children and then restores the flags once it's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide to keep the stack solution, we should then change the implementation to only snapshot and restore when necessary. For example, it isn't necessary to store the snapshot for most expressions because the state won't be changed (and, therefore, there isn't anything to restore either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, it really wasn't bad to plumb this through at the call sites.
Stmt::FunctionDef(ast::StmtFunctionDef { is_async, .. }) => { | ||
self.in_async_context = *is_async; | ||
self.seen_futures_boundary = true; | ||
} | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set in_async_context
to false when entering a Class
, lambda
, or a sync with context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good catch. It looks like class
and lambda
do cause problems even inside of an outer async
function, but with
doesn't seem to add a sync scope:
Python 3.13.2 (main, Feb 5 2025, 08:05:21) [GCC 14.2.1 20250128] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> async def foo(): lambda: [_ async for n in range(3)]
File "<python-input-4>", line 1
async def foo(): lambda: [_ async for n in range(3)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: asynchronous comprehension outside of an asynchronous function
>>> async def foo():
... class C: [_ async for n in range(3)]
...
File "<python-input-7>", line 2
class C: [_ async for n in range(3)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: asynchronous comprehension outside of an asynchronous function
>>> async def foo():
... with open("foo.txt", "w") as f:
... [_ async for n in range(3)]
...
>>>
However, these are both still errors on 3.13, so this might be a separate, version-independent rule.
This is making me think we should go back to the trait implementation and defer to the full visitor's scope-based implementation. Or at least try tracking nested scopes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a lint rule that detects these, PLE1142, but not an error.
This sounded very promising at first, but I think we would still need the checkpoints, and it would only remove the function-specific code, which I don't think adds that much "complexity" per se, like you said. Although I guess we could avoid statement checkpoints and On the other hand, it sounds like PLE1142 should also be a syntax error and needs even more careful scope tracking, so it might be worth setting up for that here. I'll work on your suggestions about removing the checkpoint stack and see how it looks after that. As I said in a reply above, we could also give the trait approach another try. That just gets annoying for inline testing. It requires a lot of new test code, and that test code doesn't really guarantee anything about the actual users. However, that's also somewhat the case here because of the tricky enter/exit calls in |
|
The ecosystem check is showing a false positive in a notebook cell, which I think should have an implicit async scope. It's probably also picking up a default python version since it mentions 3.9. |
#[derive(Debug)] | ||
pub struct Checkpoint { | ||
in_async_context: bool, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially include a DebugDropBomb
to make sure that the consumer of the enter API don't forgot to call the exit_stmt
/ exit_expr
methods which will defuse the bomb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, I think I saw some mention of this around the new diagnostics. I could also mark enter_stmt
as #[must_use]
, which could help a bit in the same direction.
// For functions, defer semantic syntax error checks until the body of the function is | ||
// visited | ||
let checkpoint = if stmt.is_function_def_stmt() { | ||
None | ||
} else { | ||
Some(self.with_semantic_checker(|semantic, context| semantic.enter_stmt(stmt, context))) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reasoning for skipping function definitions here. Won't this result in stale in_async_context
flags because we don't update the state until after we visited the entire body?
If there's a need for deferred visiting, then I'd prefer to have a enter_deferred_stmt
or, better, move the necessary checks into exit_stmt
and also pass the statement and context because exit
is exactly the hook called after visiting the node's children
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might be missing something here, but my reasoning for this was that the ast::Checker
doesn't actually visit the body of the function until visit_deferred_function
:
ruff/crates/ruff_linter/src/checkers/ast/mod.rs
Lines 2588 to 2591 in 5cee346
self.visit_parameters(parameters); | |
// Set the docstring state before visiting the function body. | |
self.docstring_state = DocstringState::Expected(ExpectedDocstringKind::Function); | |
self.visit_body(body); |
Before I added this logic, the function body was being visited twice by the SemanticSyntaxChecker
but only once by the ast::Checker
and this integration test was failing even though the inline version passed:
async def test(): return [[x async for x in elements(n)] async for n in range(3)]
which pointed to an issue in the visit order because the test visitor is much simpler:
fn visit_stmt(&mut self, stmt: &'_ Stmt) {
let checkpoint = self.checker.enter_stmt(stmt, &self.context);
ruff_python_ast::visitor::walk_stmt(self, stmt);
self.checker.exit_stmt(checkpoint);
}
Again I might be misunderstanding, but I don't think an enter_deferred_stmt
helps here because we still need this logic in visit_stmt
itself to avoid duplicating the visit. Unless you mean hiding this logic inside of SemanticSyntaxChecker::enter_stmt
. I guess that could work if I also update the test visitor to defer function bodies like the real ast::Checker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. This seems fragile. I wouldn't be aware of this out-of-order visiting when working on the SemanticSyntaxChecker
. We should at least document the constraints in which the enter
methods are called. I assumed it would be in semantic visiting order but it seems its in semantic visiting order except for functions which may be deferred (or not, depending on the caller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, very fragile. I was so confused when the integration test was failing but the inline test was fine, until I realized the function bodies were deferred. I'll work on expanding the docs for this.
I made a few improvements here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, might want to wait on @MichaReiser as he has the most context for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. It would be great if we could move some of the CLI tests to parser tests (maybe not the jupyter one because that would be very noisy unless the out-of-bound parser test have options support)
crates/ruff/tests/lint.rs
Outdated
@@ -5564,3 +5564,206 @@ fn semantic_syntax_errors() -> Result<()> { | |||
|
|||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn async_comprehension_in_sync_comprehension() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quiet clear to me why we need those cli tests. Could some of those tests be parser tests instead? The parser tests already support # parse_options: {"target-version": "3.7"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need all of these, but the third and fourth helped to detect bugs in the enter/exit
pairs and the deferred function body traversal. The first two are probably less valuable, if you want me to trim it down a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could narrow them down and maybe document why it's important that they're CLI tests.
An alternative would be to implement them as custom tests in ruff_linter
(that instantiate checker). Maybe there's something you can reuse from the linter test infra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adapted the pyflakes
test runner (as suggested on Discord) and moved all of the CLI tests to ruff_linter
. They're just in the top-level linter::tests
module for now, but I'm happy to move them around if there's a better place. I also considered nesting them under rules/syntax_errors
, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having them in linter::tests
is fine. They are integration tests after all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks! I'll merge then
* origin/main: [red-knot] Default `python-platform` to current platform (#17183) [red-knot] Add new 'unreachable code' test case (#17306) [red-knot] mypy_primer: Run on `async-utils` (#17303) [red-knot] Add custom `__setattr__` support (#16748) [red-knot] Add `__init__` arguments check when doing `try_call` on a class literal (#16512) [`flake8-pie`] Avoid false positive for multiple assignment with `auto()` (`PIE796`) (#17274) [syntax-errors] Async comprehension in sync comprehension (#17177) [`airflow`] Expand module path check to individual symbols (`AIR302`) (#17278) [syntax-errors] Check annotations in annotated assignments (#17283) [syntax-errors] Extend annotation checks to `await` (#17282) [red-knot] Add support for `assert_never` (#17287) [`flake8-pytest-style`] Avoid false positive for legacy form of `pytest.raises` (`PT011`) (#17231) [red-knot] Do not show types for literal expressions on hover (#17290) [red-knot] Fix dead-code clippy warning (#17291) [red-knot] Reachability analysis (#17199) [red-knot] Don't use latency-sensitive for handlers (#17227)
* dcreager/special-class: (26 commits) lint Add TODO about property test data Better todos Narrow type(generic) better More Python-like displays for specializations Add xfail for generic method inside generic class Comment other non-specializations Explain self_instance not being specialized Generic aliases are literals in type display Better TODO fallback type [red-knot] Default `python-platform` to current platform (#17183) [red-knot] Add new 'unreachable code' test case (#17306) [red-knot] mypy_primer: Run on `async-utils` (#17303) [red-knot] Add custom `__setattr__` support (#16748) [red-knot] Add `__init__` arguments check when doing `try_call` on a class literal (#16512) [`flake8-pie`] Avoid false positive for multiple assignment with `auto()` (`PIE796`) (#17274) [syntax-errors] Async comprehension in sync comprehension (#17177) [`airflow`] Expand module path check to individual symbols (`AIR302`) (#17278) [syntax-errors] Check annotations in annotated assignments (#17283) [syntax-errors] Extend annotation checks to `await` (#17282) ...
…17177) Summary -- Detect async comprehensions nested in sync comprehensions in async functions before Python 3.11, when this was [changed]. The actual logic of this rule is very straightforward, but properly tracking the async scopes took a bit of work. An alternative to the current approach is to offload the `in_async_context` check into the `SemanticSyntaxContext` trait, but that actually required much more extensive changes to the `TestContext` and also to ruff's semantic model, as you can see in the changes up to 31554b4. This version has the benefit of mostly centralizing the state tracking in `SemanticSyntaxChecker`, although there was some subtlety around deferred function body traversal that made the changes to `Checker` more intrusive too (hence the new linter test). The `Checkpoint` struct/system is obviously overkill for now since it's only tracking a single `bool`, but I thought it might be more useful later. [changed]: python/cpython#77527 Test Plan -- New inline tests and a new linter integration test.
Summary
Detect async comprehensions nested in sync comprehensions in async functions before Python 3.11, when this was changed.
The actual logic of this rule is very straightforward, but properly tracking the async scopes took a bit of work. An alternative to the current approach is to offload the
in_async_context
check into theSemanticSyntaxContext
trait, but that actually required much more extensive changes to theTestContext
and also to ruff's semantic model, as you can see in the changes up to 31554b4. This version has the benefit of mostly centralizing the state tracking inSemanticSyntaxChecker
, although there was some subtlety around deferred function body traversal that made the changes toChecker
more intrusive too (hence the new linter test).The
Checkpoint
struct/system is obviously overkill for now since it's only tracking a singlebool
, but I thought it might be more useful later.Test Plan
New inline tests and a new linter integration test.