You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-49561][SQL] Add SQL pipe syntax for the PIVOT and UNPIVOT operators
### What changes were proposed in this pull request?
This PR adds SQL pipe syntax support for the PIVOT and UNPIVOT operators.
For example:
```
CREATE TEMPORARY VIEW courseSales AS SELECT * FROM VALUES
("dotNET", 2012, 10000),
("Java", 2012, 20000),
("dotNET", 2012, 5000),
("dotNET", 2013, 48000),
("Java", 2013, 30000)
as courseSales(course, year, earnings);
TABLE courseSales
|> SELECT `year`, course, earnings
|> PIVOT (
SUM(earnings)
FOR course IN ('dotNET', 'Java')
);
2012 15000 20000
2013 48000 30000
```
### Why are the changes needed?
The SQL pipe operator syntax will let users compose queries in a more flexible fashion.
### Does this PR introduce _any_ user-facing change?
Yes, see above.
### How was this patch tested?
This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#48093 from dtenedor/pipe-pivot.
Authored-by: Daniel Tenedorio <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Project [year#x, __pivot_sum(coursesales.earnings) AS `sum(coursesales.earnings)`#x[0] AS dotNET#xL, __pivot_sum(coursesales.earnings) AS `sum(coursesales.earnings)`#x[1] AS Java#xL]
649
+
+- Aggregate [year#x], [year#x, pivotfirst(course#x, sum(coursesales.earnings)#xL, dotNET, Java, 0, 0) AS __pivot_sum(coursesales.earnings) AS `sum(coursesales.earnings)`#x]
650
+
+- Aggregate [year#x, course#x], [year#x, course#x, sum(earnings#x) AS sum(coursesales.earnings)#xL]
+- Project [cast(course#x as string) AS course#x, cast(year#x as int) AS year#x, cast(earnings#x as int) AS earnings#x]
655
+
+- Project [course#x, year#x, earnings#x]
656
+
+- SubqueryAlias courseSales
657
+
+- LocalRelation [course#x, year#x, earnings#x]
658
+
659
+
660
+
-- !query
661
+
table courseSales
662
+
|> select `year` as y, course as c, earnings as e
663
+
|> pivot (
664
+
sum(e) as s, avg(e) as a
665
+
for y in (2012 as firstYear, 2013 as secondYear)
666
+
)
667
+
-- !query analysis
668
+
Project [c#x, __pivot_sum(e) AS s AS `sum(e) AS s`#x[0] AS firstYear_s#xL, __pivot_avg(e) AS a AS `avg(e) AS a`#x[0] AS firstYear_a#x, __pivot_sum(e) AS s AS `sum(e) AS s`#x[1] AS secondYear_s#xL, __pivot_avg(e) AS a AS `avg(e) AS a`#x[1] AS secondYear_a#x]
669
+
+- Aggregate [c#x], [c#x, pivotfirst(y#x, sum(e) AS s#xL, 2012, 2013, 0, 0) AS __pivot_sum(e) AS s AS `sum(e) AS s`#x, pivotfirst(y#x, avg(e) AS a#x, 2012, 2013, 0, 0) AS __pivot_avg(e) AS a AS `avg(e) AS a`#x]
670
+
+- Aggregate [c#x, y#x], [c#x, y#x, sum(e#x) AS sum(e) AS s#xL, avg(e#x) AS avg(e) AS a#x]
671
+
+- Project [pipeselect(year#x) AS y#x, pipeselect(course#x) AS c#x, pipeselect(earnings#x) AS e#x]
+- Project [cast(y#x as int) AS y#x, cast(a#x as array<int>) AS a#x, cast(m#x as map<string,int>) AS m#x, cast(s#x as struct<col1:int,col2:string>) AS s#x]
701
+
+- Project [y#x, a#x, m#x, s#x]
702
+
+- SubqueryAlias yearsWithComplexTypes
703
+
+- LocalRelation [y#x, a#x, m#x, s#x]
704
+
705
+
706
+
-- !query
707
+
select earnings, `year`, s
708
+
from courseSales
709
+
join yearsWithComplexTypes on `year` = y
710
+
|> pivot (
711
+
sum(earnings)
712
+
for s in ((1, 'a'), (2, 'b'))
713
+
)
714
+
-- !query analysis
715
+
Project [year#x, __pivot_sum(coursesales.earnings) AS `sum(coursesales.earnings)`#x[0] AS {1, a}#xL, __pivot_sum(coursesales.earnings) AS `sum(coursesales.earnings)`#x[1] AS {2, b}#xL]
716
+
+- Aggregate [year#x], [year#x, pivotfirst(s#x, sum(coursesales.earnings)#xL, [1,a], [2,b], 0, 0) AS __pivot_sum(coursesales.earnings) AS `sum(coursesales.earnings)`#x]
717
+
+- Aggregate [year#x, s#x], [year#x, s#x, sum(earnings#x) AS sum(coursesales.earnings)#xL]
+- Project [cast(y#x as int) AS y#x, cast(a#x as array<int>) AS a#x, cast(m#x as map<string,int>) AS m#x, cast(s#x as struct<col1:int,col2:string>) AS s#x]
729
+
+- Project [y#x, a#x, m#x, s#x]
730
+
+- SubqueryAlias yearsWithComplexTypes
731
+
+- LocalRelation [y#x, a#x, m#x, s#x]
732
+
733
+
734
+
-- !query
735
+
table courseEarnings
736
+
|> unpivot (
737
+
earningsYear for `year` in (`2012`, `2013`, `2014`)
+- Project [cast(course#x as string) AS course#x, cast(earnings2012#x as int) AS earnings2012#x, cast(sales2012#x as int) AS sales2012#x, cast(earnings2013#x as int) AS earnings2013#x, cast(sales2013#x as int) AS sales2013#x, cast(earnings2014#x as int) AS earnings2014#x, cast(sales2014#x as int) AS sales2014#x]
0 commit comments