-
Notifications
You must be signed in to change notification settings - Fork 639
feat: allow specify the length of string column #9382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening an issue about this. We have explicitly never added this, because the length of the string has no effect on the user facing expression API. Can you give some more specific details about how you would use this feature if we implemented it? |
It is more for the data governance requirement, we have source data with
type varchar(255), or fixed length type in parquet of 255, and our tools at
the end of data pineline is cross validating all types are exact matched.
Furthermore, we have data pipeline at the end to add indexes on certain
columns, when ibis change the type to max char, a lot of system will not be
able to add index.
We do not want to add extra steps after ibis create the table to alter
table again, that will add another layer of complexity, and harder to
control from data governance perspective.
…On Thu, Jun 13, 2024 at 9:53 AM Phillip Cloud ***@***.***> wrote:
Thanks for opening an issue about this.
We have explicitly never added this, because the length of the string has
no effect on the user facing expression API.
Can you give some more specific details about how you would use this
feature if we implemented it?
—
Reply to this email directly, view it on GitHub
<#9382 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADU6JF4L7PHAGBP6I7QRDEDZHHFCPAVCNFSM6AAAAABJIY4B6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGI2DGMZWHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Fair enough, just wanted to understand the use case a bit more before implementing anything. It might take a couple iterations to complete this if we do it. There's almost certainly some places where we are being pretty loose with string types and some lurking bugs around it that will be uncovered. |
Hey @cpcloud! Are string lengths still planned to be added? I have an issue with MS SQL Server. Ibis converts a string into varchar(max), which SQL Server optimizes differently. The end result is that my query with varchar(max) takes 26 minutes, while with varchar(50), it takes 1m 50s. |
Yes they are still planned! Hopefully they will land in 10.4, I've already done a good chunk of the work in a branch. |
Amazing thank you @cpcloud for all the great work! |
Is your feature request related to a problem?
We are using Snowflake backend, and trying to load parquet file into the snowflake, the parquet has columns with fixed length string type, but after read_parquet called, all column became VARCHAR(16777216),
tried other backend, like duckdb/mysql, all seems to upscale the string type to max string length instead of maintaining the length from source.
ideally this should be presered.
What is the motivation behind your request?
Our data governance require some strict data type checking, and preserving the source string length will be greatly valuable.
Describe the solution you'd like
I would like to extend ibis schema to allow specify max length of a string column, and also make read_parquet to honor the fixed length physical type.
What version of ibis are you running?
9.0.0
What backend(s) are you using, if any?
MySQL, MSSQL, Snowflake
Code of Conduct
The text was updated successfully, but these errors were encountered: