Skip to content

Verify special case conversions for parquet physical to logical types #7506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alamb opened this issue May 14, 2025 · 1 comment
Open

Verify special case conversions for parquet physical to logical types #7506

alamb opened this issue May 14, 2025 · 1 comment

Comments

@alamb
Copy link
Contributor

alamb commented May 14, 2025

          > I did try signed->unsigned for 32 and 64 bit ints and there was no difference.

Ahh, the reason for this is that I32/64->U32/64 is handled above (around L171). I would think anything that falls through and relies on arrow_cast::cast is going to be potentially slow due to use of unary_opt, but a quick glance at the decimal code looks like it will figure out which casts are infallible and use unary instead. Perhaps other conversions do a similar optimization.

It might be worth exploring enumerating all of the allowed Parquet physical to logical type mappings and account for them here and not rely on arrow_cast machinery.

Originally posted by @etseidl in #7055 (comment)

@Dandandan
Copy link
Contributor

I wonder if it is actually possible to avoid the cast (or some of the conversions) and do the conversion when building the value buffer in the reader? @etseidl
This should avoid a separate copy / conversion (and for smaller types also reduces memory usage somewhat).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants