Dedicated attribute/method access to list of fields of a DataFrameModel
#1286
Labels
enhancement
New feature or request
DataFrameModel
#1286
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem? Please describe.
It is often helpful to have a list of all fields/columns associated with a
DataFrameModel
.For example, to easily set a consistent order of the columns of a DataFrame (especially if we want to validate this with
ordered = True
):Other cases that are coming up for me are using multiple columns to aggregate results or perform joins:
A final case is for column checks:
It would be nice not to have to manually enumerate all the fields of
MyDFModel
. Instead, to automatically obtain to obtain this information, it seems like the most convenient way is currently as follows:This is quite verbose.
Describe the solution you'd like
I propose to allow iteration over the
DataFrameModel
directly to give each column, e.g. aslist(MyDFModel)
, or as infor col in MyDFModel
.Currently,
DataFrameModel
subclass instances are not iterable.Some intuition behind this is that since
pd.DataFrame.abc
returns the column itselfdf["abc"]
, andMyDFModel.abc
returns the column name"abc"
, by analogy we might expect that the unmodified dataframedf
(all columns) should correspond to unmodifiedMyDFModel
(an iterable of all column names).Describe alternatives you've considered
Another possibility is a function e.g. maybe
pa.columns
:I think naming this function would need some thought (I don't like the idea of having a function called
columns
in the name space because I think it's likely to be a variable name often enough). This is fairly clear:Another similar possibility is a
DataFrameModel.columns
property which exposes a list of the fields associated with theDataFrameModel
. I think this would be a good solution too. Sincepd.DataFrame
already has acolumns
attribute, it should not cause particularly problematic name collisions.A workaround hinted at by jeffzi in #364 (comment)_ is to run
MyDFModel.to_schema()
to set the cache (immediately after definition ofMyDFModel
?) and then uselist(MyDFModel.__fields__)
. But this is still fairly verbose and feels a bit fragile to remember to set the cache like this.Similarly, in some cases a workaround is to access
list(DataFrameModel.__annotations__)
. This is still fairly verbose, and moreover this does not include any fields inherited from parent classes if theDataFrameModel
in question is a subclass of another one.The text was updated successfully, but these errors were encountered: