-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Enforce JOIN plan to require condition #15334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -861,167 +849,6 @@ mod test { | |||
assert_optimized_plan_equal(outer_query, expected) | |||
} | |||
|
|||
#[test] | |||
fn limit_should_push_down_join_without_condition() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case has never occurred now.
@@ -65,7 +65,7 @@ logical_plan | |||
07)------------Projection: customer.c_phone, customer.c_acctbal | |||
08)--------------LeftAnti Join: customer.c_custkey = __correlated_sq_1.o_custkey | |||
09)----------------Filter: substr(customer.c_phone, Int64(1), Int64(2)) IN ([Utf8View("13"), Utf8View("31"), Utf8View("23"), Utf8View("29"), Utf8View("30"), Utf8View("18"), Utf8View("17")]) | |||
10)------------------TableScan: customer projection=[c_custkey, c_phone, c_acctbal], partial_filters=[substr(customer.c_phone, Int64(1), Int64(2)) IN ([Utf8View("13"), Utf8View("31"), Utf8View("23"), Utf8View("29"), Utf8View("30"), Utf8View("18"), Utf8View("17")])] | |||
10)------------------TableScan: customer projection=[c_custkey, c_phone, c_acctbal], partial_filters=[substr(customer.c_phone, Int64(1), Int64(2)) IN ([Utf8View("13"), Utf8View("31"), Utf8View("23"), Utf8View("29"), Utf8View("30"), Utf8View("18"), Utf8View("17")]), Boolean(true)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I benchmarked to ensure the plan change won't impact the performance.
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ fix_join-check ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1 │ 68.97ms │ 70.14ms │ no change │
│ QQuery 2 │ 18.72ms │ 18.55ms │ no change │
│ QQuery 3 │ 30.47ms │ 29.46ms │ no change │
│ QQuery 4 │ 21.45ms │ 21.77ms │ no change │
│ QQuery 5 │ 43.53ms │ 41.56ms │ no change │
│ QQuery 6 │ 14.39ms │ 14.88ms │ no change │
│ QQuery 7 │ 55.03ms │ 55.30ms │ no change │
│ QQuery 8 │ 41.24ms │ 39.45ms │ no change │
│ QQuery 9 │ 50.84ms │ 53.19ms │ no change │
│ QQuery 10 │ 44.85ms │ 44.83ms │ no change │
│ QQuery 11 │ 13.42ms │ 13.22ms │ no change │
│ QQuery 12 │ 28.49ms │ 28.20ms │ no change │
│ QQuery 13 │ 29.56ms │ 29.38ms │ no change │
│ QQuery 14 │ 23.03ms │ 24.40ms │ 1.06x slower │
│ QQuery 15 │ 33.59ms │ 34.98ms │ no change │
│ QQuery 16 │ 13.07ms │ 13.08ms │ no change │
│ QQuery 17 │ 58.00ms │ 58.67ms │ no change │
│ QQuery 18 │ 76.00ms │ 75.06ms │ no change │
│ QQuery 19 │ 39.99ms │ 38.40ms │ no change │
│ QQuery 20 │ 29.04ms │ 29.79ms │ no change │
│ QQuery 21 │ 62.14ms │ 64.14ms │ no change │
│ QQuery 22 │ 13.82ms │ 14.01ms │ no change │
└──────────────┴─────────┴────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main) │ 809.65ms │
│ Total Time (fix_join-check) │ 812.45ms │
│ Average Time (main) │ 36.80ms │
│ Average Time (fix_join-check) │ 36.93ms │
│ Queries Faster │ 0 │
│ Queries Slower │ 1 │
│ Queries with No Change │ 21 │
└───────────────────────────────┴──────────┘
@@ -723,14 +723,14 @@ statement ok | |||
create table testSubQueryLimit (a int, b int) as values (1,2), (2,3), (3,4); | |||
|
|||
query IIII | |||
select * from testSubQueryLimit as t1 join (select * from testSubQueryLimit limit 1) limit 10; | |||
select * from testSubQueryLimit as t1, (select * from testSubQueryLimit limit 1) limit 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
# join condition is required | ||
query error join condition should not be empty | ||
SELECT * FROM t1 FULL JOIN t2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include also a cross join without filters like
SELECT * FROM t1 CROSS JOIN t2
1f0a00e
to
7765900
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @goldmedal
* add check for missing join condition * modify the tests * handle the cross join case * remove invald sql case * fix sqllogictests * fix for cross join case * revert the change for limit.slt * add cross join test
@goldmedal I don't understand this change.
which actually corresponds to the following:
The generated plan has a Left Join with a |
The I think the result is expected. |
* add check for missing join condition * modify the tests * handle the cross join case * remove invald sql case * fix sqllogictests * fix for cross join case * revert the change for limit.slt * add cross join test
Which issue does this PR close?
JOIN
should requireON
condition #13486Rationale for this change
When working on unparsing the plan optimized by
ScalarSubqueryToJoin
, I noticed it could generate a LEFT JOIN without the join condition.The unparsing result will look like
Most databases don't allow the join without the condition besides the cross join.
DuckDB
Postgres
DataFusion
I prefer to align this behavior with others. It makes the logical plan created by DataFusion reliable.
#13486 also mentioned that
join
without condition isn't a valid SQL.Because the cross-join has the same IR as the inner join without the condition, #13486 should be prevented from the SQL layer. Maybe we can consider to reopen apache/datafusion-sqlparser-rs#1552
What changes are included in this PR?
LEFT JOIN
plan with the conditionTrue
.Are these changes tested?
yes
Are there any user-facing changes?
Besides
INNER JOIN
, the join syntax without condition is not allowed anymore.