Skip to content

Typing/mypy settings #1911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

m-richards
Copy link
Collaborator

Supporting mypy changes to experiment with resolving mypy issues in #1905.

I've just initially pushed now with failing tests to show a point, but I'll probably drive discussion from the other PR for now.

Copy link

codecov bot commented Feb 15, 2025

Codecov Report

Attention: Patch coverage is 96.72131% with 2 lines in your changes missing coverage. Please review.

Project coverage is 93.52%. Comparing base (812b2a8) to head (bb1abb8).
Report is 199 commits behind head on main.

Files with missing lines Patch % Lines
pandera/engines/polars_engine.py 94.44% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1911      +/-   ##
==========================================
- Coverage   94.28%   93.52%   -0.76%     
==========================================
  Files          91      121      +30     
  Lines        7013     9395    +2382     
==========================================
+ Hits         6612     8787    +2175     
- Misses        401      608     +207     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@m-richards
Copy link
Collaborator Author

Hey @cosmicBboy, I was wondering if you might be able to share some background on how pandera/api/polars/types.py::PolarsData came to be ( I can see it was added in #1373, and before that, d9873bd). If you don't recall then no worries.

Mypy does not like this very much due to key being an optional attribute, but from what I understand the usages are semantically different, key is provided to apply checks to a specific column, and when it is none, it seems to be used for coercion of schemas with a single data type. There seem to be a set of cases where a lazyframe is wrapped in PolarsData and subsequently unwrapped, I was going to have a go at simplifying this, but wanted to see if that was a bad idea before I got too far.

(I would also understand if you're not too concerned with getting mypy to behave on library internals, so happy to just leave that out if you prefer)

Jarek-Rolski and others added 23 commits March 4, 2025 22:40
…ss#1904)

* fix DataFrame Pydantic compatibility

* format python file

* update test for new code

* prevents Linters from raising an error

Signed-off-by: Jarek-Rolski <[email protected]>

* enable pyarrow and other types in pydantic models

Signed-off-by: Jarek-Rolski <[email protected]>

---------

Signed-off-by: Jarek-Rolski <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
…1352) (unionai-oss#1902)

Enhancement: Add support for timezone-flexible DateTime (unionai-oss#1352)

Signed-off-by: Max Raphael <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
…nai-oss#1905)

Signed-off-by: Matt Richards <[email protected]>

* trial type annotations

Signed-off-by: Matt Richards <[email protected]>

* changes in individual api files

Signed-off-by: Matt Richards <[email protected]>

* pl.dataframe working in local test

Signed-off-by: Matt Richards <[email protected]>

* older python union compat

Signed-off-by: Matt Richards <[email protected]>

* try polars in the mypy env on ci

Signed-off-by: Matt Richards <[email protected]>

* translate toplevel mypy skip into module specific skips

Signed-off-by: Matt Richards <[email protected]>

* mypy passes

Signed-off-by: Matt Richards <[email protected]>

* missing line continuation

Signed-off-by: Matt Richards <[email protected]>

* python 3.8

Signed-off-by: Matt Richards <[email protected]>

---------

Signed-off-by: Matt Richards <[email protected]>
* declared support for python 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* set python3.8 as min python in setup's 'python_requires'

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* bumped pylint to <3.3 (3.2.x latest to be run with 3.8)

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* added pyupgrade and applied modifications

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* .github/tests: skipped pyspark tests on 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* TEMP: disabled pylint warnings (possibly-used-before-assignment). need addressing

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* .github: added setuptools to test run deps (required by noxfile), since it's no longer installed by default with python 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* added setuptools as an explicit test dependency

it is required in noxfile and not automatically provided by python >= 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* updated dependency spec of typing_extensions

typing_extensions is currently required by all python versions

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* updated autogen requirements files

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

---------

Signed-off-by: Guillaume Andreu Sabater <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
…nai-oss#1905)

Signed-off-by: Matt Richards <[email protected]>

* trial type annotations

Signed-off-by: Matt Richards <[email protected]>

* changes in individual api files

Signed-off-by: Matt Richards <[email protected]>

* pl.dataframe working in local test

Signed-off-by: Matt Richards <[email protected]>

* older python union compat

Signed-off-by: Matt Richards <[email protected]>

* try polars in the mypy env on ci

Signed-off-by: Matt Richards <[email protected]>

* translate toplevel mypy skip into module specific skips

Signed-off-by: Matt Richards <[email protected]>

* mypy passes

Signed-off-by: Matt Richards <[email protected]>

* missing line continuation

Signed-off-by: Matt Richards <[email protected]>

* python 3.8

Signed-off-by: Matt Richards <[email protected]>

---------

Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
* declared support for python 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* set python3.8 as min python in setup's 'python_requires'

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* bumped pylint to <3.3 (3.2.x latest to be run with 3.8)

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* added pyupgrade and applied modifications

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* .github/tests: skipped pyspark tests on 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* TEMP: disabled pylint warnings (possibly-used-before-assignment). need addressing

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* .github: added setuptools to test run deps (required by noxfile), since it's no longer installed by default with python 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* added setuptools as an explicit test dependency

it is required in noxfile and not automatically provided by python >= 3.12

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* updated dependency spec of typing_extensions

typing_extensions is currently required by all python versions

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

* updated autogen requirements files

Signed-off-by: Guillaume Andreu Sabater <[email protected]>

---------

Signed-off-by: Guillaume Andreu Sabater <[email protected]>
…s#1916)

* use UV in nox

Signed-off-by: cosmicBboy <[email protected]>

* regenerate ci requirements files

Signed-off-by: cosmicBboy <[email protected]>

* fix tests, update ci-tests.yaml

Signed-off-by: cosmicBboy <[email protected]>

* debug ci-tests

Signed-off-by: cosmicBboy <[email protected]>

* fix ci-test matrix

Signed-off-by: cosmicBboy <[email protected]>

* update nox

Signed-off-by: cosmicBboy <[email protected]>

* debug noxfile and readthedocs config

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug pyspark deps

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug mypy test errors

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* clean up docs, noxfile, ci

Signed-off-by: cosmicBboy <[email protected]>

* uncomment check requirements

* more cleanups

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
pd import called twice

Signed-off-by: Matt Richards <[email protected]>
* update publish.yaml ci

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
* testing publish ci

Signed-off-by: cosmicBboy <[email protected]>

* debug

Signed-off-by: cosmicBboy <[email protected]>

* finalize publish workflow

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
@m-richards m-richards force-pushed the typing/mypy_settings branch from b4399d5 to 84e78f8 Compare March 4, 2025 11:41
@cosmicBboy
Copy link
Collaborator

I was wondering if you might be able to share some background on how pandera/api/polars/types.py::PolarsData came to be ( I can see it was added in #1373, and before that, d9873bd). If you don't recall then no worries.

hey @m-richards apologies for the late reply here. So PolarsData came about because I wanted check functions to only take a single argument so that it can operate both on an entire dataframe or on a single column of the dataframe. The key is optional because if it's not None, the assumption is that the key is used to apply a check to a specific column in the dataframe, if it's None, then the check is applied to the entire dataframe (e.g. Check.gt(0)) should be applicable to an entire dataframe.

This is needed for dataframe libraries like polars and pyspark where it's better to construct the validation rules using the lazily constructed computation graph.

Mypy does not like this very much due to key being an optional attribute

What are the issues here specifically? Seems odd that mypy wouldn't support Optional[TYPE].

I would also understand if you're not too concerned with getting mypy to behave on library internals, so happy to just leave that out if you prefer

I don't think it's very practical to change the check function signature: I think it keeps the Check function signature clean, simple, and easy to understand (open to debating this point!)

I always support getting mypy to behave with library internals unless it involves contorting the code a lot just to make mypy happy.

@cosmicBboy
Copy link
Collaborator

hey @m-richards looked into the PolarsData type issue: #1972

This PR changes the signature to key: str, let me know if this handles the mypy issues you're seeing.

@m-richards
Copy link
Collaborator Author

hey @m-richards looked into the PolarsData type issue: #1972

This PR changes the signature to key: str, let me know if this handles the mypy issues you're seeing.

@cosmicBboy I'll answer both replies in one, since they're linked:

What are the issues here specifically? Seems odd that mypy wouldn't support Optional[TYPE].

Yeah I should have been more specific. mypy supports it fine, the problem comes is that e.g. in pandera/backends/polars/builtin_checks.py::in_range
there's

def in_range(
    data: PolarsData,
    min_value: T,
    max_value: T,
    include_min: bool = True,
    include_max: bool = True,
) -> pl.LazyFrame:
    col = pl.col(data.key) 

which gives
error: Argument 1 to "__call__" of "Col" has incompatible type "str | None"; expected "str | ...
in - practice you know / and have tested such that the PolarsData in this case is never going to have data.key is None at this call site. I wasn't sure if we could just change the representation of all columns from None to "*" so here (and in the local copy of this branch I have) I was seeing what it would look like if you treat these as two seperate PolarsData sub-types for the sake of mypy - the "all columns" case and the "specific column" case.

Agree that a simpler signature is nicer and easier to think about and it seems that #1972 handles this from a mypy perspective (47 errors instead of 62
running mypy with these lines

; [mypy-pandera.engines.polars_engine]
; ignore_errors = True
;
; [mypy-pandera.backends.polars.builtin_checks]
; ignore_errors = True

commented in the mypy config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants