-
Notifications
You must be signed in to change notification settings - Fork 942
Introduced dtype_enum
to hold additional type metadata
#18494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.06
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/okay to test 5f811ba |
/okay to test e18b431 |
dtype_enum
to hold additional type metadata
@@ -783,9 +785,9 @@ def to_pandas( | |||
nullable: bool = False, | |||
arrow_type: bool = False, | |||
) -> pd.Index: | |||
if nullable: | |||
if arrow_type or self.dtype_enum in {2}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would only want self.dtype_enum
to influence to_pandas
in pandas compatible mode, correct? Otherwise, a non cudf.pandas user who calls e.g. to_pandas(arrow_type=False)
may still get an Arrow dtype pandas object back
} | ||
|
||
|
||
def get_dtype_enum(dtype: Dtype) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be nice to use an enum.Enum
to represent this so at least in the code we could check e.g. PandasTypeEnum.ARROW
or something
DTYPE_ENUM_MAP = { | ||
PANDAS_NUMPY_DTYPE: 1, | ||
pd.core.dtypes.dtypes.ArrowDtype: 2, | ||
pd.core.dtypes.dtypes.ExtensionDtype: 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I don't think we should be supporting any arbitrary ExtensionDtype
at this point - this would include 3rd party custom ExtensionDtype
I know Ashwin (and partially myself) had some reservation about introducing another attribute that we would need to keep in sync. From the review:
Did you happen to explore if using |
Co-authored-by: Matthew Roeschke <[email protected]>
Description
This PR introduces
dtype_enum
, a type enum that represents the true pandas dtype backend. Pandas currently supports 3 dtype backends:If a pandas series with any other the above dtypes is passed to
cudf
, thedtype_enum
will be set accordingly and during theto_pandas
conversion the original dtype will be restored using thedtype_enum
.dtype_enum
is only functional in pandas-compatibiliy mode.Fixes: #14149
I plan on opening separate PR's to fix the failures that are being newly unlocked in pandas test-suite after this PR is merged.
Checklist