Skip to content

INSERT INTO FILES fail to parse row/column delimiter like '0x11' #57125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wxl24life opened this issue Mar 20, 2025 · 5 comments · Fixed by #57126
Closed

INSERT INTO FILES fail to parse row/column delimiter like '0x11' #57125

wxl24life opened this issue Mar 20, 2025 · 5 comments · Fixed by #57126
Labels
type/bug Something isn't working

Comments

@wxl24life
Copy link
Contributor

Steps to reproduce the behavior (Required)

create database unload;
USE unload;
-- SELECT * FROM sales_records;
CREATE TABLE sales_records(
    record_id     BIGINT,
    seller        STRING,
    store_id      INT,
    sales_time    DATETIME,
    sales_amt     DOUBLE
)
DUPLICATE KEY(record_id)
PARTITION BY date_trunc('day', sales_time)
DISTRIBUTED BY HASH(record_id);

INSERT INTO sales_records
VALUES
    (220313001,"Amy",1,"2022-03-13 12:00:00",8573.25),
    (220314002,"Bob",2,"2022-03-14 12:00:00",6948.99),
    (220314003,"Amy",1,"2022-03-14 12:00:00",4319.01),
    (220315004,"Carl",3,"2022-03-15 12:00:00",8734.26),
    (220316005,"Carl",3,"2022-03-16 12:00:00",4212.69),
    (220317006,"Bob",2,"2022-03-17 12:00:00",9515.88);


INSERT INTO FILES(
    "path" = "oss://cdp-hangzhou/unload/test_03/",
    "format" = "CSV",
    'csv.column_separator' = '0X11'
    -- 'csv.row_delimiter' = '\\X12'
) 
SELECT * FROM unload.sales_records;

Expected behavior (Required)

Image

Real behavior (Required)

Image

StarRocks version (Required)

  • main branch
@kevincai
Copy link
Contributor

what's the \X11? is it a single non-printable char or a str "\X11"?

@wxl24life
Copy link
Contributor Author

wxl24life commented Mar 21, 2025

what's the \X11? is it a single non-printable char or a str "\X11"?

@kevincai It's hexadecimal representations of some special ASCII control character. The character is supported in other features like load and export. We should try best to make insert into files to align with the behavior there

@kevincai
Copy link
Contributor

@EsoragotoSpirit please take a look, if the relative doc needs an update of this row_delimiter/col_delimiter escape capability?

@EsoragotoSpirit
Copy link
Contributor

And the minor version this will be supported?

@kevincai
Copy link
Contributor

And the minor version this will be supported?

I am able to find the doc describing the column separator here : https://docs.starrocks.io/docs/sql-reference/sql-functions/table-functions/files/#csvcolumn_separator

The rule may not be explicit or clear enough and is only in FILES, not sure if this can be a separate page to cross reference in other places where the column separator is used, and user may be able to jump to the specific chart to know how to set the separator and what's the limitation.

The same applies to the row_delimiter. As far as the minor version, I think we can firstly make the rules clear. This fix may be treated as a bugfix and applies to all versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants