-
Notifications
You must be signed in to change notification settings - Fork 616
Support for numpy.ndarray and pandas.Series with any python object as entry #4444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support for numpy.ndarray and pandas.Series with any python object as entry #4444
Conversation
- Use `.iat` instead of `.iloc` to set values in pandas strategies
…rage since we now actually cover all types and this line is now not covered
Some form of timeout error in CI |
@tybug (I've hit retry, should be OK soon 🤞) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your PR, Shaun!
This is looking good, and I'm excited to ship it soon! Small comments below about testing and code-comments; and I can always push something to the changelog when I work out what I wanted for that.
raise InvalidArgument(f"No strategy inference for {dtype}") | ||
raise InvalidArgument(f"No strategy inference for {dtype}") # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still hit this with void-dtype; can we add a covering test? (e.g. replacing the deleted test case?)
data[c.name].iloc[i] = value | ||
data[c.name].iat[i] = value # noqa: PD009 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense in the context of this PR, but it'd be great to write a one-or-two-sentence comment above the line explaining why we use .iat over .iloc here, for the benefit of future maintainers.
@@ -40,7 +40,6 @@ def e(a, **kwargs): | |||
e(nps.array_shapes, min_dims=33), | |||
e(nps.array_shapes, max_dims=33), | |||
e(nps.arrays, dtype=float, shape=(0.5,)), | |||
e(nps.arrays, dtype=object, shape=1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider a void-dtype test case here to maintain coverage?
pdst.series(elements=st.just(anything), dtype=object).filter( | ||
lambda x: not x.empty | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a filter, I think we could pass index=range_indexes(min_size=1)
, right?
hypothesis-python/RELEASE.rst
Outdated
This version adds support for generating numpy.ndarray and pandas.Series with any python object as an element. | ||
Effectively, hypothesis can now generate ``np.array([MyObject()], dtype=object)``. | ||
The first use-case for this is with Pandas and Pandera where it is possible and sometimes required to have columns which themselves contain structured datatypes. | ||
Pandera seems to be waiting for this change to support ``PythonDict, PythonTypedDict, PythonNamedTuple`` etc. | ||
|
||
--- | ||
|
||
- Accept ``dtype.kind = 'O'`` in ``from_dtype`` | ||
- Use ``.iat`` instead of ``.iloc`` to set values in pandas strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(apologies for this comment, it's late at night & I don't really know what I want to do instead, but thought it better to send a review now than wait until later)
I'd like to rework this note, to focus more tightly on the specific changes - as prose, not dot-points - and then afterwards note why this is valuable, with pandera only mentioned as one possible case for structured data within a pandas series. I'd also include cross-references to each class you mention, and (optional but encouraged) a thank-you note to yourself at the end of the changelog ("Thanks to Shaun Read for identifying and fixing these issues!" or similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, ok I've reworded the release notes and implemented all the suggestions
Some interesting error is occurring outside of the changes in this PR... |
sorry for dropping the requested review here, I'd want to be confident I understand the pandas interactions first and I don't have that requisite knowledge at the moment 😅 That failure might be a real crosshair failure, but I'm not sure it's worth pursuing with such a non-reproducer. |
As far as I understand
Seems to be vaguely related. But the important points are:
From the docstrings:
Demonstration:
prints out:
|
This change would add support for generating numpy.ndarray and pandas.Series with any python object as an element.
Effectively, hypothesis can now generate
np.array([MyObject()], dtype=object)
.The first use-case for this is with Pandas and Pandera where it is possible and sometimes required to have columns which themselves contain structured datatypes.
Pandera seems to be waiting for this change to support PythonDict, PythonTypedDict, PythonNamedTuple etc.
from_dtype
.iat
instead of.iloc
to set values in pandas strategies (this allows setting of dictionaries as elements etc)