Skip to content

feat: allow specify the length of string column #9382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
ruiyang2015 opened this issue Jun 13, 2024 · 6 comments · Fixed by #11045
Closed
1 task done

feat: allow specify the length of string column #9382

ruiyang2015 opened this issue Jun 13, 2024 · 6 comments · Fixed by #11045
Labels
feature Features or general enhancements snowflake The Snowflake backend

Comments

@ruiyang2015
Copy link

Is your feature request related to a problem?

We are using Snowflake backend, and trying to load parquet file into the snowflake, the parquet has columns with fixed length string type, but after read_parquet called, all column became VARCHAR(16777216),
tried other backend, like duckdb/mysql, all seems to upscale the string type to max string length instead of maintaining the length from source.
ideally this should be presered.

What is the motivation behind your request?

Our data governance require some strict data type checking, and preserving the source string length will be greatly valuable.

Describe the solution you'd like

I would like to extend ibis schema to allow specify max length of a string column, and also make read_parquet to honor the fixed length physical type.

What version of ibis are you running?

9.0.0

What backend(s) are you using, if any?

MySQL, MSSQL, Snowflake

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ruiyang2015 ruiyang2015 added the feature Features or general enhancements label Jun 13, 2024
@cpcloud
Copy link
Member

cpcloud commented Jun 13, 2024

Thanks for opening an issue about this.

We have explicitly never added this, because the length of the string has no effect on the user facing expression API.

Can you give some more specific details about how you would use this feature if we implemented it?

@ruiyang2015
Copy link
Author

ruiyang2015 commented Jun 13, 2024 via email

@cpcloud
Copy link
Member

cpcloud commented Jun 13, 2024

Fair enough, just wanted to understand the use case a bit more before implementing anything.

It might take a couple iterations to complete this if we do it. There's almost certainly some places where we are being pretty loose with string types and some lurking bugs around it that will be uncovered.

@mvanwyk
Copy link

mvanwyk commented Mar 21, 2025

Hey @cpcloud! Are string lengths still planned to be added?

I have an issue with MS SQL Server. Ibis converts a string into varchar(max), which SQL Server optimizes differently. The end result is that my query with varchar(max) takes 26 minutes, while with varchar(50), it takes 1m 50s.

@cpcloud
Copy link
Member

cpcloud commented Mar 21, 2025

Yes they are still planned! Hopefully they will land in 10.4, I've already done a good chunk of the work in a branch.

@mvanwyk
Copy link

mvanwyk commented Mar 21, 2025

Yes they are still planned! Hopefully they will land in 10.4, I've already done a good chunk of the work in a branch.

Amazing thank you @cpcloud for all the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements snowflake The Snowflake backend
Projects
Archived in project
4 participants