-
-
Notifications
You must be signed in to change notification settings - Fork 344
feature/koalas-beta #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feature/koalas-beta #651
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Strategies should not rely on pandas dtype aliases (#620) * add test for strategy with pandas.DatetimeTZDtype using a datetime.tzinfo * avoid coercing with string alias in strategies * support timedelta in data synthesis strats (#621) * fix multiindex error reporting (#622) * Pin pylint (#629) * bump pre-commit pylint version * pin pylint * remove setuptools pins * setup.py setuptools * add back setuptools dep * update ci build * update build * update nox build * update nox build * exclude np.float128 type registration in MacM1 (#624) * exclude np.float128 type registration in MacM1 * replace windows/mac m1 checks with float128 check * fix numpy_pandas_coercible bug dealing with single element (#626) * fix numpy_pandas_coercible bug dealing with single element * add test * remove empty case * update pylint (#630) * unpin pylint, remove setuptools constraint * bump cache * install simpleeval in noxfile * re-pin pylint * fix lint * nox uses setuptools < 58.0.0 Co-authored-by: Jean-Francois Zinque <[email protected]>
* add test for all pandas-compatible numpy dtypes * add support for np.bytes_ * add support for rare object aliases * add support for platform-specific numpy dtypes
* bugfix: support nullable empty strategies fix #634 * update black, mypy * hypothesis health check * fix
fixes #640. This PR improves the performance of schema strategies that involve nullable fields. Instead of a 10x performance hit it's a 2x performance hit for specifying a nullable column.
* reuse coerce logic in engines.utils * add test_coerce_error * rename coerce to try_coerce and _coerce to coerce
Codecov Report
@@ Coverage Diff @@
## dev #651 +/- ##
==========================================
+ Coverage 98.85% 98.94% +0.08%
==========================================
Files 30 31 +1
Lines 3398 3497 +99
==========================================
+ Hits 3359 3460 +101
+ Misses 39 37 -2
Continue to review full report at Codecov.
|
* improve lazy validation performance for nullable cases fixes #652 This PR fixes an issue where setting `lazy=True` with a schema where `nullable=False` and there are lot of null values causes severe performance issues in the ~500,000 row dataframe case. The fix is to drop duplicates when aggregating failure cases and removing unnecessary data processing of lazily collected failure cases. * reintroduce sorting/dropping of duplicates
add tests for koalas fix type issues with koalas patch to pd.Series, DataFrame add datatype koalas tests finish writing initial test suite for koalas fix regressions configure koalas fix regressions update pylint dep update deps update black fix lint use context manager for koalas ops_on_diff_frames updates update pre-commit mypy typing ignore fix docs install hypothesis for koalas ci don't cover modin import check better handling of timestamp fix koalas wip wip wip coverage hypothesis health check
75e0c99
to
9ed4496
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses one part of #601
This PR introduces support for
koalas
object validation (DataFrame
,Series
, andIndex
), so pandera can be used like so:Install
Then validate away!
Notes:
check_utils
module adds a bunch of methods for checking the type of a object-to-validate. This should probably be moved somewhere else.compute.ops_on_diff_frames
config, which allows for computations across dataframes (which can involve expensive join operations). Clearing this tech debt would require someone more familiar with the koalas project.pandera
accessor class. Need to find a way around this for modin, which doesn't currently support the accessor extension utility. Koalas does, though, so will need to implement that extension in a future PR.This feature is in beta, so many bugs are expected.