Skip to content

[red-knot] Add __init__ arguments check when doing try_call on a class literal #16512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 8, 2025

Conversation

mishamsk
Copy link
Contributor

@mishamsk mishamsk commented Mar 5, 2025

Summary

Caveats:

  • No handling of __new__ or modified __call__ on metaclass.
  • This leads to a couple of false positive errors in tests

Test Plan

  • A couple new cases in mdtests
  • cargo nextest run -p red_knot_python_semantic --no-fail-fast

@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 5, 2025

This is still a draft, as I wanted to discuss a couple of things first.

First of all, @sharkdp did all the hard work and it seems almost too easy now. However, there are a couple of caveats.

  • If I follow the current CallOutcome setup and how other dunder calls are done - incorrect call signature causes the resulting type to become Unknown. I think this is not every helpful as it would cause all kinds of missed downstream diagnostics. You can see this happening in this test - I haven't fixed it intentionally.
    • I would suggest that for __init__ call, the diagnostics are added to CallOutcome but an instance type is returned even if error occurs. This is not possible now, because official API that creates CallOutcome is bind_call which will emit argument mismatch second time (following whatever try_call_dunder returns). The only other way to create CallOutcome - from_return_type is marked as deprecated. @carljm any objections to creating a CallOutcome ::new method for this case?
  • I have two concerns with the current diagnostics like [too-many-positional-arguments] "Too many positional arguments to bound method ``__init__``: expected 1, got 2"
    • Maybe it is worth special casing __init__ and avoiding the bound method portion?
    • Not strictly related to this PR, but for bound methods expected/actual number of arguments should not count the first one. So if there are no arguments to __init__ the diagnostic should be expected 0, got 1 instead of the current expected 1, got 2

@carljm
Copy link
Contributor

carljm commented Mar 5, 2025

  • Maybe it is worth special casing __init__ and avoiding the bound method portion?

I'm not convinced. __init__ is a bound method. What's the harm in saying so?

  • if there are no arguments to __init__ the diagnostic should be expected 0, got 1 instead of the current expected 1, got 2

Agreed, should be a pretty easy change I think?

@carljm
Copy link
Contributor

carljm commented Mar 5, 2025

I'm not sure I understand your first point. It's both possible and common to have a CallOutcome that both includes binding errors and has a return type -- in fact that's what we generally do for function calls, as this test will show:

def f() -> int:
    return 1

x = f(1)  # error: [too-many-positional-arguments]

reveal_type(x)  # revealed: int

What's unusual about __init__ is that it returns None, so our usual return-type handling won't work here. This gets into what I mentioned in Discord about modeling both __new__ and __init__. The proper modeling here is that we call __new__, which returns an instance of the type (in order to do this properly based on the typeshed definition of object.__new__, we would need to support typing.Self -- but it may not be hard to add a hacky version of support for Self), and then we call __init__ on that instance, and we don't care about the return type of __init__. (We don't have to model things totally correctly on the first PR, but we'll want to have TODO comments for the correct modeling, and we shouldn't put too much work into workarounds to model things wrongly, if we can avoid it.)

Those are just some general comments, haven't looked at the code in the PR yet, will do that tomorrow!

@sharkdp
Copy link
Contributor

sharkdp commented Mar 5, 2025

and then we call __init__ on that instance, and we don't care about the return type of __init__

Maybe this is what you meant by "We don't have to model things totally correctly", but I guess we should (eventually) emit a diagnostic if __init__ does not return None, since the runtime throws a TypeError in case it doesn't.

@AlexWaygood AlexWaygood added the ty Multi-file analysis & type inference label Mar 5, 2025
@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 5, 2025

I'm not sure I understand your first point

First, I realized I confused everyone by saying CallOutcome doesn't have the right API's, but that was supposed to be CallBinding of course. Let me break it down step by step.

  • I must call try_call_dunder to trigger descriptor protocol and see what __init__ call yields
  • The call returns Result<CallOutcome<'db>, CallDunderError<'db>>. In case of errors I'll get CallDunderError
  • I then map CallDunderError to CallError. So far so good
  • Now, I need to return something like Ok(CallOutcome::Single(CallBinding { ..., errors: vec![error_from_dunder_call] })). However, there is only one public API that creates it bind_call which does the full type checking logic and inserts its own errors. There is no API to create a CallBinding by supplying types & errors directly

Hence, I suggest adding a method of a function in bind.rs for this case.

Any objections?

Agreed, should be a pretty easy change I think?

I haven't looked yet, I'll apply the change since everyone is fine. But this means that more tests will change in this PR than just those related to __init__. Or I can make it a separate PR too...

@carljm
Copy link
Contributor

carljm commented Mar 6, 2025

(Didn't get to this today, sorry; will come back to it tomorrow.)

@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 6, 2025

Maybe this is what you meant by "We don't have to model things totally correctly", but I guess we should (eventually) emit a diagnostic if __init__ does not return None, since the runtime throws a TypeError in case it doesn't.

@sharkdp agree. but IMHO the diagnostic should be emitted when traversing the __init__ definition, not at call site. So it should be tackled separately and won't affect this code at all. Unless I am missing something

@mishamsk mishamsk force-pushed the add-class-init-argument-checks branch from c495452 to 598ac2d Compare March 7, 2025 04:25
@mishamsk mishamsk marked this pull request as ready for review March 7, 2025 04:26
@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 7, 2025

@carljm never mind my previous comments, I found an API I needed. I think this is ready for review.

I had to temporarily add 3 false positive errors in two unrelated mdtests due to lack of new` handling. I will follow with a PR for that and they'll go away.

I also fixed the argument counting in the diagnostics when calling bound methods

@mishamsk mishamsk requested a review from sharkdp March 7, 2025 04:41
@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 7, 2025

bummer... bench is failing due to tomlib using a lot of int(..) calls which emit false positives for now (cause we do not check __new__ yet and there is no temporary special case check of KnownClass::int).

I'd rather do a bandaid - add a hard-coded handler for int, just like bool, str currently and tackle __new__ in a separate PR. Hope no objections to this

@mishamsk mishamsk force-pushed the add-class-init-argument-checks branch from 4ae950a to 9ca7653 Compare March 7, 2025 22:58
@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 7, 2025

copy pasting from discord.

Looks like I will not be able to get away with a two PR approach for __init__ & __new__ signature checking.

Many builtin types & stdlib (e.g. int, datetime.time) only define __new__, not __Init__. Which is perfectly fine at runtime, since object.__init__ would take any amount of arguments without raising an error when custom __new__ is present. It only raises TypeError for extra arguments without custom __new__.

At the same time, both object.__init__ and object.__new__ are defined as argument-less in typeshed. This works well for classes without __new__ & __init__, only __new__ or when both __new__ & __init__ are defined. Not so much for the ones with only __new__. I could have ignored __init__ entirely if no __new__ is defined. But it is always defined curtesy of object...

So in order for me to avoid false positives on the benchmark, I would still need to at least check if __new__ is defined anywhere in MRO except object. But that seems like unnecessary temporary workaround. If I am checking for the existence of __new__ I may as well try calling it...

Which also brings another necessity - I'll need to extend class_member API with an optional argument to stop checking MRO before object... Luckily this API is only used in 2 places now, so doesn't seem like a big deal.

In short, two options here

  • add quite a lot of false positive diagnostics on tomllib to the bench and merge in current state
  • or, apply class_member API changes, add at least basic handling of __new__ (e.g. ignore return type, which is usually Self in typeshed and not supported by knot)

team, leaning on to you for decision here. Normally I'd prefer more smaller PR's, but this may not be the right time.

@carljm
Copy link
Contributor

carljm commented Mar 8, 2025

Thanks for digging into this! Given those choices, I think we may as well just add __new__ support? The initial version can ignore the return type of __new__.

If even that turns out to be harder than expected, we can also just check for if __new__ is defined (barring object.__new__) and not do any constructor checks at all for such types.

I think you are right that we will need the ability to query for a member while ignoring its definition on the object base. (Or possibly we can just hardcode this behavior for __init__ and __new__ lookups and avoid needing to pass in an external flag?)

(For future reference, another wrinkle we should add a TODO for is if the class' metaclass defines __call__ -- if so that takes priority over __new__ and __init__. EDIT -- looks like we already have that mentioned in a TODO comment.)

@mishamsk
Copy link
Contributor Author

mishamsk commented Mar 8, 2025

@carljm

For future reference, another wrinkle we should add a TODO for is if the class' metaclass defines call -- if so that takes priority over new and init

Yes, it was already there and was step 3 in my initial plan. will be step 2 (hopefully, I'll manage to get away without).

Will work on __new__ as soon as I have time. thx. have a great weekend!

@carljm
Copy link
Contributor

carljm commented Mar 8, 2025

Yes, it was already there and was step 3 in my initial plan. will be step 2 (hopefully, I'll manage to get away without).

Will work on __new__ as soon as I have time. thx. have a great weekend!

Awesome, thank you; you too!

mishamsk and others added 11 commits April 7, 2025 15:40
temp

wip

special case typevar & object in try_call to avoid salsa cycle

Revert to setting call outcome type, instead of injecting errors

fix __new__ handling
use enum for mro search policy
… object fallback

properly handle subclasses
properly handle init method when new is present
ignore GenericAlias.__getattr__ for now
ensure `type.__new__` doesn't popup during class instantiation
@carljm carljm force-pushed the add-class-init-argument-checks branch from e1612bd to 8331fd5 Compare April 7, 2025 20:22
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you for all your work on this!

I've pushed some changes. The most significant one is to add find_name_in_mro_with_policy and class_member_with_policy, to mirror the existing member_lookup_with_policy, and have the find_name_in_mro and class_member methods be thin wrappers that just use the default policy. This arguably makes writing internal Type code a bit more error-prone, as it means we can more easily accidentally drop a policy we should have propagated. But I think it's worth it because it removes so much MemberLookupPolicy::default() noise all over the place.

Since I've made significant changes, I won't merge this right away but will leave it overnight in case you or anyone else has comments on the changes I've made; otherwise I'll merge it first thing in the morning my time.

Comment on lines 2595 to 2601
// Make sure `__mro__` is always searched all the way to object.
let policy = if name == "__mro__" {
policy & !MemberLookupPolicy::MRO_NO_OBJECT_FALLBACK
} else {
policy
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary; all tests pass without it. It seems to me it shouldn't be necessary unless we have a bug somewhere else; we shouldn't look up __mro__ with this flag in the first place.

@@ -3143,6 +3227,56 @@ impl<'db> Type<'db> {
);
Signatures::single(signature)
}
Some(KnownClass::TypeVar) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed on current main, with fixpoint iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome. I think it was still failing when I added it, but great that it is already unnecessary

Signatures::single(signature)
}
Some(KnownClass::Object) => {
// // Added here to avoid salsa query cycle in typeshed definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed to avoid a cycle, so I'll remove this comment. There is a test that fails without it, and I think this is a reasonable one to special-case rather than relying on fixpoint iteration, so I'll leave the arm here for now.

Copy link
Contributor Author

@mishamsk mishamsk Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. especially since it doesn't have a corresponding code in bind, so not as painful as other cases which must be kept in sync with a completely different module

Comment on lines 3636 to 3639
// Lookup `__new__` method in the MRO up to, but not including, `object`. Also, we must
// avoid `__new__` on `type` since per descriptor protocol, if `__new__` is not defined on
// a class, metaclass attribute would take precedence. But by avoiding `__new__` on `object`
// we would inadvertently unhide `__new__` on `type`, which is not what we want.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit about type.__new__ is interesting. I wonder if an alternative approach here could be to not skip object.__new__ but instead mark it in a way that makes it easy to check if that's the one we found. Then we wouldn't have the side effect of unhiding type.__new__, meaning we need two lookup flags.

Not going to look into that now, but might add a comment about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea. I'll look into it when I do a follow up for __call__ handling. Or it may retackled separately in try_call_dunder unification #16371 - because this essentially about trying to create a unified API that everyone can call without too much thinking how the lookups should behave.

@carljm
Copy link
Contributor

carljm commented Apr 7, 2025

Actually I might try a few things to see if we can bring down the perf regression before merging this.

@carljm
Copy link
Contributor

carljm commented Apr 7, 2025

Looking at the mypy-primer results, there are a lot of new false positives related to dataclass __init__ methods, as expected.

Some other false positives appear to be because we lack handling of assignability/subtyping of a class literal or subclass-of type to a Callable type. I think the proper handling here (barring metaclass __call__ handling) would consider the class to be an intersection of its __init__ and __new__ signatures. But we could also do a temporary version that just uses the current Type::signatures() implementation as forgiving gradual callable.

@mishamsk
Copy link
Contributor Author

mishamsk commented Apr 7, 2025

Some other false positives appear to be because we lack handling of assignability/subtyping of a class literal or subclass-of type to a Callable type. I think the proper handling here (barring metaclass call handling) would consider the class to be an intersection of its init and new signatures. But we could also do a temporary version that just uses the current Type::signatures() implementation as forgiving gradual callable.

sounds like another follow-up candidate to me

Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great, thank you!

Co-authored-by: David Peter <[email protected]>
@carljm carljm merged commit fab7d82 into astral-sh:main Apr 8, 2025
22 of 23 checks passed
@mishamsk mishamsk deleted the add-class-init-argument-checks branch April 9, 2025 01:08
@mishamsk
Copy link
Contributor Author

mishamsk commented Apr 9, 2025

thanks everyone for patiently waiting on this one! 🚀

@carljm
Copy link
Contributor

carljm commented Apr 9, 2025

Thanks for the PR!

dcreager added a commit that referenced this pull request Apr 9, 2025
* origin/main:
  [red-knot] Default `python-platform` to current platform (#17183)
  [red-knot] Add new 'unreachable code' test case (#17306)
  [red-knot] mypy_primer: Run on `async-utils` (#17303)
  [red-knot] Add custom `__setattr__` support (#16748)
  [red-knot] Add `__init__` arguments check when doing `try_call` on a class literal (#16512)
  [`flake8-pie`] Avoid false positive for multiple assignment with `auto()` (`PIE796`) (#17274)
  [syntax-errors] Async comprehension in sync comprehension (#17177)
  [`airflow`] Expand module path check to individual symbols (`AIR302`) (#17278)
  [syntax-errors] Check annotations in annotated assignments (#17283)
  [syntax-errors] Extend annotation checks to `await` (#17282)
  [red-knot] Add support for `assert_never` (#17287)
  [`flake8-pytest-style`] Avoid false positive for legacy form of `pytest.raises` (`PT011`) (#17231)
  [red-knot] Do not show types for literal expressions on hover (#17290)
  [red-knot] Fix dead-code clippy warning (#17291)
  [red-knot] Reachability analysis (#17199)
  [red-knot] Don't use latency-sensitive for handlers (#17227)
dcreager added a commit that referenced this pull request Apr 9, 2025
* dcreager/special-class: (26 commits)
  lint
  Add TODO about property test data
  Better todos
  Narrow type(generic) better
  More Python-like displays for specializations
  Add xfail for generic method inside generic class
  Comment other non-specializations
  Explain self_instance not being specialized
  Generic aliases are literals in type display
  Better TODO fallback type
  [red-knot] Default `python-platform` to current platform (#17183)
  [red-knot] Add new 'unreachable code' test case (#17306)
  [red-knot] mypy_primer: Run on `async-utils` (#17303)
  [red-knot] Add custom `__setattr__` support (#16748)
  [red-knot] Add `__init__` arguments check when doing `try_call` on a class literal (#16512)
  [`flake8-pie`] Avoid false positive for multiple assignment with `auto()` (`PIE796`) (#17274)
  [syntax-errors] Async comprehension in sync comprehension (#17177)
  [`airflow`] Expand module path check to individual symbols (`AIR302`) (#17278)
  [syntax-errors] Check annotations in annotated assignments (#17283)
  [syntax-errors] Extend annotation checks to `await` (#17282)
  ...
Glyphack pushed a commit to Glyphack/ruff that referenced this pull request Apr 9, 2025
…class literal (astral-sh#16512)

## Summary

* Addresses #16511 for simple cases where only `__init__` method is
bound on class or doesn't exist at all.
* fixes a bug with argument counting in bound method diagnostics

Caveats:
* No handling of `__new__` or modified `__call__` on metaclass.
* This leads to a couple of false positive errors in tests

## Test Plan

- A couple new cases in mdtests
- cargo nextest run -p red_knot_python_semantic --no-fail-fast

---------

Co-authored-by: Carl Meyer <[email protected]>
Co-authored-by: David Peter <[email protected]>
sharkdp added a commit that referenced this pull request Apr 15, 2025
## Summary

Add very early support for dataclasses. This is mostly to make sure that
we do not emit false positives on dataclass construction, but it also
lies some foundations for future extensions.

This seems like a good initial step to merge to me, as it basically
removes all false positives on dataclass constructor calls. This allows
us to use the ecosystem checks for making sure we don't introduce new
false positives as we continue to work on dataclasses.

## Ecosystem analysis

I re-ran the mypy_primer evaluation of [the `__init__`
PR](#16512) locally with our
current mypy_primer version and project selection. It introduced 1597
new diagnostics. Filtering those by searching for `__init__` and
rejecting those that contain `invalid-argument-type` (those could not
possibly be solved by this PR) leaves 1281 diagnostics. The current
version of this PR removes 1171 diagnostics, which leaves 110
unaccounted for. I extracted the lint + file path for all of these
diagnostics and generated a diff (of diffs), to see which
`__init__`-diagnostics remain. I looked at a subset of these: There are
a lot of `SomeClass(*args)` calls where we don't understand the
unpacking yet (this is not even related to `__init__`). Some others are
related to `NamedTuple`, which we also don't support yet. And then there
are some errors related to `@attrs.define`-decorated classes, which
would probably require support for `dataclass_transform`, which I made
no attempt to include in this PR.

## Test Plan

New Markdown tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ty Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants