Pandas nullable String dtype is not recognized as a Pandera String #1054
-
Describe the bug A column containing the
I attempted to search for this issue, but might have missed it. Sorry if I did! Code Sample, a copy-pastable exampleimport pandas as pd
import pandera as pa
schema = pa.DataFrameSchema(
['color': pa.Column(pa.dtypes.String)]
)
data = pd.Series(['red', 'green', 'blue'], dtype='string').to_frame('color')
schema.validate(data) Expected behaviorI expected this schema to pass validation successfully. Desktop (please complete the following information):
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
hi @gwerbin-tive, you'll need to use In general the recommended way of doing this is to use the Need to work on better datatype docs! |
Beta Was this translation helpful? Give feedback.
-
Thank you for clarifying @cosmicBboy. I was hoping to support either one of Also, I am somewhat surprised that |
Beta Was this translation helpful? Give feedback.
-
so This is simply by definition in the pandera API. Similarly, |
Beta Was this translation helpful? Give feedback.
-
In pandera one needs to be precise about the types. The pandera schema follow whatever conventions are set by the underlying framework (pandas in this case). So the You'll have to create your own custom data type if you want to define a string type that can either be a pandas-native |
Beta Was this translation helpful? Give feedback.
-
Converting this issue to a discussion. @gwerbin-tive would you mind marking the appropriate response as the answer? |
Beta Was this translation helpful? Give feedback.
hi @gwerbin-tive, you'll need to use
pandera.STRING
here, since you want to use the pandera-native string type. See here for all the dtype aliases defined by pandera (pandera.String
is the numpy string type).In general the recommended way of doing this is to use the
pd.StringDtype()
directly or use the string alias"string"
. If you want to use the pandera datatype usepandera.STRING
, which is just an alias of this.Need to work on better datatype docs!