-
-
Notifications
You must be signed in to change notification settings - Fork 344
Feature: Add support for Generic to SchemaModel #810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add support for Generic to SchemaModel #810
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #810 +/- ##
==========================================
- Coverage 97.34% 97.28% -0.06%
==========================================
Files 43 43
Lines 3990 4020 +30
==========================================
+ Hits 3884 3911 +27
- Misses 106 109 +3
Continue to review full report at Codecov.
|
Thanks @tfwillems, this is awesome! I'll review it over the next few days, but I'm on-board with the use case and solution overall. @jeffzi let me know if you have any thoughts here.
CI seems to be happy, can you do a pip freeze on your environment? I can try to reproduce
In CI it seems to be |
The FastAPI tests may be failing b/c I'm running this on a server that often restricts reads/writes to other URLs so I'm not sure if that is driving the test failures for the 3 FastAPI tests. When running pre-commit on the original dev branch of pandera, I obtained the following stacktrace: black....................................................................Failed
- hook id: black
- exit code: 1
Traceback (most recent call last):
File "/cluster/home/willems/.cache/pre-commit/repob39yzst3/py_env-python3.9/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/cluster/home/willems/.cache/pre-commit/repob39yzst3/py_env-python3.9/lib/python3.9/site-packages/black/__init__.py", line 1423, in patched_main
patch_click()
File "/cluster/home/willems/.cache/pre-commit/repob39yzst3/py_env-python3.9/lib/python3.9/site-packages/black/__init__.py", line 1409, in patch_click
from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/cluster/home/willems/.cache/pre-commit/repob39yzst3/py_env-python3.9/lib/python3.9/site-packages/click/__init__.py) Here's the result of
|
Thanks for your contribution @tfwillems. I agree it could open the doors to new applications, especially to create reusable intermediate models. |
cd1861c
to
7a36331
Compare
Hi @tfwillems thanks for the contribution! It looks like a few areas in the new code aren't covered by tests, see here. Merging this PR to |
* Adapt SchemaModel so that it can inherit from typing.Generic * Extend SchemaModel to enable generic types in fields * fix linter Co-authored-by: Thomas Willems <[email protected]> Co-authored-by: cosmicBboy <[email protected]>
* add imports to fastapi docs * Add option to disallow duplicate column names (#758) * ENH: add duplicate detection to dataframeschema * ENH: propagate duplicate colnames check to schemamodel * Add getter setter property * make schemamodel actually work, update __str__ * fix __repr__ as well * fix incorrect default value * black formatting has changed * invert parameter naming convention * address other PR comments * fix doctests, comma in __str__ * maybe fix sphinx errors * fix ci and mypy tests * Update test_schemas.py * fix lint Co-authored-by: cosmicBboy <[email protected]> * Make SchemaModel use class name, define own config (#761) * Make SchemaModel use class name, define own config * fix * fix * fix * fix tests * fix lint and docs * add test Co-authored-by: cosmicBboy <[email protected]> * implement coercion-on-initialization for DataFrame[SchemaModel] types (#772) * implement coercion-on-initialization * pylint * Update tests/core/test_model.py Co-authored-by: Matt Richards <[email protected]> Co-authored-by: Matt Richards <[email protected]> * update conda install instructions (#776) * add documentation for pandas_engine.DateTime (#780) * add documentation for pandas_engine.DateTime * fix removed numpy_engine.Object doc * set default n_failure_cases to None (#784) * Update filtering columns for performance reasons. (#777) * Update filtering columns for performance reasons. * Update pandera/schemas.py * Update schemas.py * Update schemas.py * Bugfix in schemas.py Co-authored-by: Niels Bantilan <[email protected]> * implement pydantic model data type (#779) * make finding coerce failure cases faster (#792) * make finding coerce failure cases faster * fix tests * remove unneeded import * fix tests, coverage * update docs for 0.10.0 (#795) * add pyspark support, deprecate koalas (#793) * add support for pyspark.pandas, deprecate koalas * update docs * add type check in pandas generics * update docs * clean up ci * fix mypy, generics * fix generic hack * improve coverage * Add overloads to `schema.to_yaml` (#790) * Add overloads to `to_yaml` * Update schemas.py Co-authored-by: Niels Bantilan <[email protected]> * add support for logical data types * add initial support for decimal * fix dtype check * Feature: Add support for Generic to SchemaModel (#810) * Adapt SchemaModel so that it can inherit from typing.Generic * Extend SchemaModel to enable generic types in fields * fix linter Co-authored-by: Thomas Willems <[email protected]> Co-authored-by: cosmicBboy <[email protected]> * fix pandas_engine.DateTime.coerce_value not consistent with coerce (#827) * pyspark docs fixes * fix koalas link to pyspark * bump version 0.10.1 * fix pandas_engine.DateTime.coerce_value not consistent with coerce Co-authored-by: cosmicBboy <[email protected]> * Refactor logical type check method * add logical types tests * add back conftest * fix test_invalid_annotations * fix ray initialization in setup_modin_engine * fix logical type validation when output is an iterable * add Decimal data type to pandera.__init__ * remove DataType.is_logical * add logical types documentation * Update dtypes.rst * Update dtypes.rst * increase coverage * fix SchemaErrors.failure_cases with logical types * fix modin compatibility for logical type validation * fix prepare_series_check_output compatibility with pyspark * fix mypy error * Update dtypes.rst Co-authored-by: cosmicBboy <[email protected]> Co-authored-by: Matt Richards <[email protected]> Co-authored-by: Sean Mackesey <[email protected]> Co-authored-by: Ferdinand Hahmann <[email protected]> Co-authored-by: Robert Craigie <[email protected]> Co-authored-by: tfwillems <[email protected]> Co-authored-by: Thomas Willems <[email protected]>
* add imports to fastapi docs * Add option to disallow duplicate column names (#758) * ENH: add duplicate detection to dataframeschema * ENH: propagate duplicate colnames check to schemamodel * Add getter setter property * make schemamodel actually work, update __str__ * fix __repr__ as well * fix incorrect default value * black formatting has changed * invert parameter naming convention * address other PR comments * fix doctests, comma in __str__ * maybe fix sphinx errors * fix ci and mypy tests * Update test_schemas.py * fix lint Co-authored-by: cosmicBboy <[email protected]> * Make SchemaModel use class name, define own config (#761) * Make SchemaModel use class name, define own config * fix * fix * fix * fix tests * fix lint and docs * add test Co-authored-by: cosmicBboy <[email protected]> * implement coercion-on-initialization for DataFrame[SchemaModel] types (#772) * implement coercion-on-initialization * pylint * Update tests/core/test_model.py Co-authored-by: Matt Richards <[email protected]> Co-authored-by: Matt Richards <[email protected]> * update conda install instructions (#776) * add documentation for pandas_engine.DateTime (#780) * add documentation for pandas_engine.DateTime * fix removed numpy_engine.Object doc * set default n_failure_cases to None (#784) * Update filtering columns for performance reasons. (#777) * Update filtering columns for performance reasons. * Update pandera/schemas.py * Update schemas.py * Update schemas.py * Bugfix in schemas.py Co-authored-by: Niels Bantilan <[email protected]> * implement pydantic model data type (#779) * make finding coerce failure cases faster (#792) * make finding coerce failure cases faster * fix tests * remove unneeded import * fix tests, coverage * update docs for 0.10.0 (#795) * add pyspark support, deprecate koalas (#793) * add support for pyspark.pandas, deprecate koalas * update docs * add type check in pandas generics * update docs * clean up ci * fix mypy, generics * fix generic hack * improve coverage * Add overloads to `schema.to_yaml` (#790) * Add overloads to `to_yaml` * Update schemas.py Co-authored-by: Niels Bantilan <[email protected]> * add support for logical data types * add initial support for decimal * fix dtype check * Feature: Add support for Generic to SchemaModel (#810) * Adapt SchemaModel so that it can inherit from typing.Generic * Extend SchemaModel to enable generic types in fields * fix linter Co-authored-by: Thomas Willems <[email protected]> Co-authored-by: cosmicBboy <[email protected]> * fix pandas_engine.DateTime.coerce_value not consistent with coerce (#827) * pyspark docs fixes * fix koalas link to pyspark * bump version 0.10.1 * fix pandas_engine.DateTime.coerce_value not consistent with coerce Co-authored-by: cosmicBboy <[email protected]> * Refactor logical type check method * add logical types tests * add back conftest * fix test_invalid_annotations * fix ray initialization in setup_modin_engine * fix logical type validation when output is an iterable * add Decimal data type to pandera.__init__ * remove DataType.is_logical * add logical types documentation * Update dtypes.rst * Update dtypes.rst * increase coverage * fix SchemaErrors.failure_cases with logical types * fix modin compatibility for logical type validation * fix prepare_series_check_output compatibility with pyspark * fix mypy error * Update dtypes.rst Co-authored-by: cosmicBboy <[email protected]> Co-authored-by: Matt Richards <[email protected]> Co-authored-by: Sean Mackesey <[email protected]> Co-authored-by: Ferdinand Hahmann <[email protected]> Co-authored-by: Robert Craigie <[email protected]> Co-authored-by: tfwillems <[email protected]> Co-authored-by: Thomas Willems <[email protected]>
Currently, SchemaModel is not compatible with typing.Generic. There are 2 straightforward applications in which I see Generic being useful:
e.g.
Currently, this won't work as it results in the following error:
This application is also not currently supported.
This PR makes a few minor modifications to SchemaModel to enable both applications of Generic. I've also added a set of unit tests that suggest this is working as intended. The code required to enable Generic types in fields was heavily inspired by what's present in pydantic
I wasn't sure whether it'd be ideal for SchemaModel to natively support generic arguments, or whether it'd be better to create a sub-class like GenericSchemaModel that enables this functionality. Ultimately, I opted for the former, as the changes seemed fairly minimal.
I tried to follow the best practice dev docs, but ran into a few issues:
Would love to get your thoughts on the utility of this and the prototype I've provided