Skip to content

Expand wildcard to actual expressions in prepare_select_exprs #15090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 11, 2025

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Mar 8, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Mar 8, 2025
TestStatementWithDialect {
sql: "select * from (select * from j1 limit 10);",
expected:
"SELECT * FROM (SELECT * FROM `j1` LIMIT 10) AS `derived_limit`",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expand * to all the columns, should we convert back to * for unparser?

I'm not sure should we support this, I delete related test and consider not supported

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's hard to convert back to * after removing Expr::Wildcard. It's okay for me to show all the column names for unparsing. They are equivalent.

@@ -635,7 +642,15 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
planner_context,
options,
)?;
Ok(wildcard_with_options(planned_options))

let expanded =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There two parts are the real change, others are test adjustment

@@ -54,15 +54,6 @@ use sqlparser::dialect::{Dialect, GenericDialect, HiveDialect, MySqlDialect};
mod cases;
mod common;

#[test]
fn test_schema_support() {
Copy link
Contributor Author

@jayzhan211 jayzhan211 Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move all the test including select * (those tests fail because they are expanded) to slt files. Add explain statement for them

@jayzhan211 jayzhan211 marked this pull request as ready for review March 8, 2025 10:00
# expect this query to run successfully, not error
query III
# b is not selected in subquery
query error DataFusion error: Schema error: No field named b\. Valid fields are t1\.c, t1\.a, t1\.a0\.
select * from (select c, a, NULL::int as a0 from t order by a, c) t1
union all
select * from (select c, NULL::int as a, a0 from t order by a0, c) t2
order by c, a, a0, b
Copy link
Contributor Author

@jayzhan211 jayzhan211 Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is executable in main, but it is not in this PR because b is not in the subquery and I think this is reasonable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. This SQL isn't valid. However, it's added by #12721, which shows that issue #12446 has been solved. I do not understand the details of #12446. Maybe @alamb knows why we expect it works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check and get back to you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the query is invalid (and postgres agrees). I made a PR to fix it here:

@goldmedal goldmedal self-requested a review March 10, 2025 01:31
TestStatementWithDialect {
sql: "select * from (select * from j1 limit 10);",
expected:
"SELECT * FROM (SELECT * FROM `j1` LIMIT 10) AS `derived_limit`",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's hard to convert back to * after removing Expr::Wildcard. It's okay for me to show all the column names for unparsing. They are equivalent.

Comment on lines 519 to 532
},
TestStatementWithDialect {
sql: "SELECT * FROM (SELECT j1_id + 1 FROM j1) AS temp_j(id2)",
expected: r#"SELECT * FROM (SELECT (`j1`.`j1_id` + 1) AS `id2` FROM `j1`) AS `temp_j`"#,
parser_dialect: Box::new(GenericDialect {}),
unparser_dialect: Box::new(SqliteDialect {}),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't remove them. They aren't related to the wildcard expansion. It's better to change the expected result to match the expanded result instead.

SELECT `temp_j`.`id2` FROM (SELECT (`j1`.`j1_id` + 1) AS `id2` FROM `j1`) AS `temp_j

# expect this query to run successfully, not error
query III
# b is not selected in subquery
query error DataFusion error: Schema error: No field named b\. Valid fields are t1\.c, t1\.a, t1\.a0\.
select * from (select c, a, NULL::int as a0 from t order by a, c) t1
union all
select * from (select c, NULL::int as a, a0 from t order by a0, c) t2
order by c, a, a0, b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. This SQL isn't valid. However, it's added by #12721, which shows that issue #12446 has been solved. I do not understand the details of #12446. Maybe @alamb knows why we expect it works?

@@ -1231,44 +1238,20 @@ physical_plan


# Test: inputs into union with different orderings
query TT
query error DataFusion error: Schema error: No field named d\. Valid fields are t1\.b, t1\.c, t1\.a, t1\.a0\.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I'm worried about the test related to #12446 🤔

07)----DataSourceExec: partitions=1, partition_sizes=[0]

query TT
explain SELECT * FROM person a NATURAL JOIN lineitem b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very nice to port these tests to sqllogictests

# expect this query to run successfully, not error
query III
# b is not selected in subquery
query error DataFusion error: Schema error: No field named b\. Valid fields are t1\.c, t1\.a, t1\.a0\.
select * from (select c, a, NULL::int as a0 from t order by a, c) t1
union all
select * from (select c, NULL::int as a, a0 from t order by a0, c) t2
order by c, a, a0, b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the query is invalid (and postgres agrees). I made a PR to fix it here:

@jayzhan211 jayzhan211 marked this pull request as draft March 11, 2025 00:43
@jayzhan211 jayzhan211 marked this pull request as ready for review March 11, 2025 01:57
@jayzhan211 jayzhan211 requested review from alamb and goldmedal March 11, 2025 03:08
Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211. It looks good to me 👍

parser_dialect: Box::new(GenericDialect {}),
unparser_dialect: Box::new(UnparserDefaultDialect {}),
},
TestStatementWithDialect {
sql: "SELECT * FROM UNNEST([1,2,3])",
expected: r#"SELECT * FROM UNNEST([1, 2, 3])"#,
expected: r#"SELECT UNNEST(make_array(Int64(1),Int64(2),Int64(3))) FROM UNNEST([1, 2, 3])"#,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm.. The result is weird because the column name isn't quoted and is uncommon, but it's not related to this PR. I think we should have another PR for it. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change the name display for unnest array to unnest([..])

@jayzhan211 jayzhan211 merged commit 359735a into apache:main Mar 11, 2025
24 checks passed
@jayzhan211 jayzhan211 deleted the mv-wildcard branch March 11, 2025 23:49
@jayzhan211
Copy link
Contributor Author

Thanks @goldmedal @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants