feat(optimizer): index accelerating `TopN` #7726

y-wei · 2023-02-06T16:38:52Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Optimizing applicable LogicalTopN-LogicalScan to LogicalLimit-LogicalScan by a new rule named top_n_on_index that arbitrarily match the above structure.

Please explain IN DETAIL what the changes are in this PR and why they are needed:

Summarize your change (mandatory)
LogicalTopN will be replaced with LogicalLimit if there is applicable index or primary key.

On index:

dev=> explain select v1 from t order by v1 limit 1;
                       QUERY PLAN                       
--------------------------------------------------------
 BatchLimit { limit: 1, offset: 0 }
 └─BatchExchange { order: [idx1.v1 ASC], dist: Single }
   └─BatchLimit { limit: 1, offset: 0 }
     └─BatchScan { table: idx1, columns: [v1] }
(4 rows)

On primary key:

dev=> explain select * from t order by k limit 1;
                     QUERY PLAN                     
----------------------------------------------------
 BatchLimit { limit: 1, offset: 0 }
 └─BatchExchange { order: [t.k ASC], dist: Single }
   └─BatchLimit { limit: 1, offset: 0 }
     └─BatchScan { table: t, columns: [k, cnt] }
(4 rows)

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
~~- [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features).~~
I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer the issue)
All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#7656

Signed-off-by: Eurekaaw <[email protected]>

y-wei · 2023-02-06T18:09:27Z

~~Actually the query result does not make sense currently 🥲. Is it caused by a missing MergeSort as mentioned in #7656 ?~~ Seems solved

Signed-off-by: Eurekaaw <[email protected]>

y-wei · 2023-02-06T22:24:25Z

Some planner tests which seem quite unrelated fail 🤨

chenzl25

Could you please add a planner test and a e2e test? Rest LGTM!

chenzl25 · 2023-02-07T04:02:39Z

src/frontend/src/optimizer/plan_node/logical_topn.rs

+    // For Optimizing TopN
+    pub fn get_child(&self) -> PlanRef {
+        self.core.input.clone()
+    }
+


We can just use existing method input().

chenzl25 · 2023-02-07T04:21:36Z

src/frontend/src/optimizer/rule/agg_on_index_rule.rs

+impl Rule for AggOnIndexRule {
+    fn apply(&self, plan: PlanRef) -> Option<PlanRef> {
+        let logical_topn: &LogicalTopN = plan.as_logical_top_n()?;
+        let logical_scan: LogicalScan = logical_topn.get_child().as_logical_scan()?.to_owned();


This rule name is inconsistent with its match pattern. Maybe TopNOnIndexRule is more appropriate.

Signed-off-by: Eurekaaw <[email protected]>

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs

Co-authored-by: lmatz <[email protected]>

BugenZhao · 2023-02-07T05:34:48Z

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs

+                    .map(|idx_item| FieldOrder {
+                        index: idx_item.index,
+                        direct: Asc,
+                    })


I'm not sure whether we've supported it, but we can specify DESC order when creating index in postgres (which is often used along with NULLS FIRST/LAST). So maybe we should not hard-code an Asc here, but use index_table.pk instead.

cc @chenzl25

Agree. Use index_table.pk instead of hard-code Asc even if we do not support creating indexes with descending ordering currently.

Since index_items and index_table.pk.items() are not quite matched, I'm using index_table.pk.items() instead and it seems won't affect the result. Actually this part of code is derived from

risingwave/src/frontend/src/optimizer/plan_node/logical_scan.rs

Line 551 in ed27ecd

let index = self.indexes().iter().find(|idx| {

. Shall I also change the logic there?

BugenZhao · 2023-02-07T05:44:19Z

Done by a new rule named agg_on_index that arbitrarily match the above structure.

Please also update the PR description as it will be used as the commit message.

Are there any handlers/executors that I need to take care of?

The batch scan will output at least one chunk to downstream, so we still have to scan 1024(default chunk size) * parallelism records. To achieve a better effect, we may also need to tell the scan to reduce the chunk size. 🤔

BugenZhao · 2023-02-07T12:02:08Z

Primary key is not supported.

Are we going to support it in next PRs?

Signed-off-by: Eurekaaw <[email protected]>

…topn_acc

y-wei · 2023-02-07T14:41:57Z

The batch scan will output at least one chunk to downstream, so we still have to scan 1024(default chunk size) * parallelism records. To achieve a better effect, we may also need to tell the scan to reduce the chunk size. 🤔

Are you talking about

risingwave/src/batch/src/executor/row_seq_scan.rs

Line 291 in ed27ecd

chunk_size,

? Actually I'm interested in how point_get works. Can we leverage this?

risingwave/src/batch/src/executor/row_seq_scan.rs

Line 322 in ed27ecd

let (point_gets, range_scans): (Vec<ScanRange>, Vec<ScanRange>) = scan_ranges

Signed-off-by: Eurekaaw <[email protected]>

BugenZhao · 2023-02-07T15:03:00Z

Are you talking about

risingwave/src/batch/src/executor/row_seq_scan.rs

Line 291 in ed27ecd

chunk_size,

? Actually I'm interested in how point_get works. Can we leverage this?

risingwave/src/batch/src/executor/row_seq_scan.rs

Line 322 in ed27ecd

let (point_gets, range_scans): (Vec<ScanRange>, Vec<ScanRange>) = scan_ranges

Exactly. I guess this is not related to point_get, which uses StateStore::get instead of scan under the hood. Also, we can make this more general: for queries with ORDER BY .. LIMIT 10, we can also set the chunk size to 10 since it's much smaller than 1024.

Signed-off-by: Eurekaaw <[email protected]>

codecov · 2023-02-08T04:02:00Z

Codecov Report

Merging #7726 (8265643) into main (dffc2f1) will increase coverage by 0.00%.
The diff coverage is 89.91%.

❗ Current head 8265643 differs from pull request most recent head cb87843. Consider uploading reports for the commit cb87843 to get more accurate results

@@           Coverage Diff            @@
##             main    #7726    +/-   ##
========================================
  Coverage   71.70%   71.71%            
========================================
  Files        1096     1097     +1     
  Lines      174608   174724   +116     
========================================
+ Hits       125208   125303    +95     
- Misses      49400    49421    +21

Flag	Coverage Δ
rust	`71.71% <89.91%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/batch/src/executor/join/local_lookup_join.rs	`54.51% <0.00%> (-0.10%)`	⬇️
src/batch/src/executor/row_seq_scan.rs	`21.90% <0.00%> (-0.54%)`	⬇️
...c/frontend/src/optimizer/plan_node/generic/scan.rs	`88.00% <ø> (ø)`
src/frontend/src/optimizer/rule/mod.rs	`100.00% <ø> (ø)`
...rc/frontend/src/optimizer/plan_node/batch_limit.rs	`80.00% <85.71%> (ø)`
...frontend/src/optimizer/rule/top_n_on_index_rule.rs	`95.40% <95.40%> (ø)`
src/frontend/src/optimizer/mod.rs	`95.24% <100.00%> (+0.05%)`	⬆️
...frontend/src/optimizer/plan_node/batch_seq_scan.rs	`87.89% <100.00%> (+0.06%)`	⬆️
...c/frontend/src/optimizer/plan_node/logical_scan.rs	`94.44% <100.00%> (+0.10%)`	⬆️
src/source/src/row_id.rs	`90.90% <0.00%> (-1.14%)`	⬇️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

BugenZhao · 2023-02-08T05:22:09Z

src/frontend/src/optimizer/plan_node/logical_scan.rs

@@ -86,6 +86,7 @@ impl LogicalScan {
            table_desc,
            indexes,
            predicate,
+            chunk_size: 1024,


What about using an Option here? If it's None, then the batch executor will follow the value from the config, so that we don't need to hard code a 1024 here.

TIPS: wrap a primitive into a new message will make it optional in proto3 like this:

risingwave/proto/stream_plan.proto

Lines 632 to 634 in 9e1ff93

message Parallelism {

uint64 parallelism = 1;

}

src/frontend/src/optimizer/plan_node/logical_scan.rs

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs

BugenZhao · 2023-02-08T05:26:18Z

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs

+                    direct: if op.order_type == OrderType::Ascending {
+                        Direction::Asc
+                    } else {
+                        Direction::Desc
+                    },


😄 I guess @richardchien is working on unifying these types.

:doge:

BTW there's no conflict for now. I'll unify these types after finishing distinct aggregator, which may need some days.

chenzl25 · 2023-02-08T06:38:07Z

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs

+        let index = logical_scan.indexes().iter().find(|idx| {
+            Order {
+                field_order: idx
+                    .index_table
+                    .pk()
+                    .iter()
+                    .map(|idx_item| FieldOrder {
+                        index: idx_item.index,
+                        direct: idx_item.direct,
+                    })
+                    .collect(),
+            }
+            .satisfies(order)


We can't compare index order directly with the topn order, because they refer to different columns. We need a map between them. The following case can reproduce this bug.

create table t (a int primary key, b int); create index idx on t(b); explain select * from t order by b limit 1;

Then should we test if the index order is a superset of the topn order?

We need it and actually satisfies method has already handled the superset problem, but considering the above case, my point is we need to compare index order with topn order at the same level, that is, both refer to primary table column or index column.

got it, I'll try to solve it in the next commit

seems solved 😁

dev=> create table t (v1 int, v2 int, v3 int); CREATE_TABLE dev=> create index idx on t (v2) include (v2, v3); CREATE_INDEX dev=> explain select v2, v3 from t order by v2, v3 limit 1; QUERY PLAN ---------------------------------------------------------------------- BatchTopN { order: "[t.v2 ASC, t.v3 ASC]", limit: 1, offset: 0 } └─BatchExchange { order: [], dist: Single } └─BatchTopN { order: "[t.v2 ASC, t.v3 ASC]", limit: 1, offset: 0 } └─BatchScan { table: t, columns: [v2, v3] } (4 rows) dev=> explain select v2, v3 from t order by v2 limit 1; QUERY PLAN ------------------------------------------------------- BatchLimit { limit: 1, offset: 0 } └─BatchExchange { order: [idx.v2 ASC], dist: Single } └─BatchLimit { limit: 1, offset: 0 } └─BatchScan { table: idx, columns: [v2, v3] } (4 rows)

Signed-off-by: Eurekaaw <[email protected]>

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs

Signed-off-by: Eurekaaw <[email protected]>

y-wei · 2023-02-08T17:01:54Z

Have no idea of the misc-check


Creating buildkite018631e5b10349f0b1e7fa248ec0e85f_rw-build-env_run ... done
--
  | Check protobuf code format && Lint protobuf | 15s
  | batch_plan.proto.orig 2023-02-08 16:40:37.650298442 +0000 | 0s
  | batch_plan.proto 2023-02-08 16:40:37.650298442 +0000 | 1s
  | @@ -25,9 +25,9 @@
  | // Whether the order on output columns should be preserved.
  | bool ordered = 5;
  |  
  | -  message ChunkSize {
  | -    uint32 chunk_size = 1;
  | -  }
  | +  message ChunkSize {
  | +    uint32 chunk_size = 1;
  | +  }
  | // If along with `batch_limit`, `chunk_size` will be set.
  | ChunkSize chunk_size = 6;
  | }
  | ERROR: 100

Signed-off-by: Eurekaaw <[email protected]>

chenzl25 · 2023-02-09T04:02:59Z

One more case needs to be handled. When a table scan has a filter which could be a primary lookup followed by a topn. It seems we should ensure we still choose the primary table scan instead of the index.

create table t (a int primary key, b int);
create index idx on t(b);
explain select * from t where a = 1 order by b limit 1;

y-wei · 2023-02-09T05:02:35Z

One more case needs to be handled. When a table scan has a filter which could be a primary lookup followed by a topn. It seems we should ensure we still choose the primary table scan instead of the index.

Could you explain a bit about the reason of doing primary lookup?

BugenZhao · 2023-02-09T05:04:09Z

proto/batch_plan.proto

@@ -24,6 +24,12 @@ message RowSeqScanNode {
  common.Buffer vnode_bitmap = 4;
  // Whether the order on output columns should be preserved.
  bool ordered = 5;
+
+  message ChunkSize { 


Trailing whitespace here?

Nice catch!

chenzl25 · 2023-02-09T05:13:48Z

One more case needs to be handled. When a table scan has a filter which could be a primary lookup followed by a topn. It seems we should ensure we still choose the primary table scan instead of the index.
create table t (a int primary key, b int);
create index idx on t(b);
explain select * from t where a = 1 order by b limit 1;

Because primary point lookup is more efficient than index range scan in this case. Maybe we can only apply this rule when there are no predicates in the table scan. Actually this decision should be done by a cost based optimizer, but we don't have now, so let's optimize those cases can be improved definitely.

Signed-off-by: Eurekaaw <[email protected]>

chenzl25

LGTM! Great work, thank you.

stdrc · 2023-02-09T05:38:47Z

proto/batch_plan.proto

+    uint32 chunk_size = 1;
+  }
+  // If along with `batch_limit`, `chunk_size` will be set.
+  ChunkSize chunk_size = 6;


Why not optional uint32 chunk_size?

I remember long ago we forbid optional in proto3 for some reason, but it seems to work now. 😂 Both LGTM.

I propose we leave it for a future task that refractors all those optional workarounds 😂

Signed-off-by: Eurekaaw <[email protected]>

y-wei added 2 commits February 6, 2023 11:11

mimic logical_scan

398d938

Signed-off-by: Eurekaaw <[email protected]>

let topn using index, first trial

4bf7616

Signed-off-by: Eurekaaw <[email protected]>

y-wei requested review from lmatz, chenzl25 and BugenZhao February 6, 2023 16:38

y-wei changed the title ~~feat (optimizer): index accelerating TopN~~ feat(optimizer): index accelerating TopN Feb 6, 2023

y-wei added 4 commits February 6, 2023 13:12

remove useless code

ff9d519

Signed-off-by: Eurekaaw <[email protected]>

remove useless code

4a9efc6

Signed-off-by: Eurekaaw <[email protected]>

solve the order problem

93ef95c

Signed-off-by: Eurekaaw <[email protected]>

have default any order

3d6798d

Signed-off-by: Eurekaaw <[email protected]>

y-wei marked this pull request as ready for review February 6, 2023 23:13

chenzl25 reviewed Feb 7, 2023

View reviewed changes

rename and replace with input()

f9fcc0a

Signed-off-by: Eurekaaw <[email protected]>

lmatz reviewed Feb 7, 2023

View reviewed changes

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs Show resolved Hide resolved

add license

baf8e6c

Co-authored-by: lmatz <[email protected]>

BugenZhao reviewed Feb 7, 2023

View reviewed changes

chenzl25 mentioned this pull request Feb 7, 2023

Support creating indexes with descending ordering. #7740

Closed

y-wei added 2 commits February 7, 2023 09:27

using index_table.pk rather than index_items

495fa8e

Signed-off-by: Eurekaaw <[email protected]>

Merge branch 'topn_acc' of github.com:ClearloveWei/risingwave_e into …

978a480

…topn_acc

update planner test

616dc8a

Signed-off-by: Eurekaaw <[email protected]>

add e2e tests

fbd1b3d

Signed-off-by: Eurekaaw <[email protected]>

y-wei added A-optimizer Area: SQL optimizer. type/feature Type: New feature. type/perf Type: Performance. labels Feb 7, 2023

y-wei added 2 commits February 7, 2023 22:31

support primary key

5e51615

Signed-off-by: Eurekaaw <[email protected]>

format

8265643

Signed-off-by: Eurekaaw <[email protected]>

BugenZhao reviewed Feb 8, 2023

View reviewed changes

BugenZhao requested a review from chenzl25 February 8, 2023 05:27

chenzl25 requested changes Feb 8, 2023

View reviewed changes

y-wei added 4 commits February 8, 2023 09:31

make chunk_size optional, replace satisfies with supersets

e277ee8

Signed-off-by: Eurekaaw <[email protected]>

apply s2p mapping before comparing index and topn order

37ce43c

Signed-off-by: Eurekaaw <[email protected]>

translate index column_idx to topn_col_idx

fc94781

Signed-off-by: Eurekaaw <[email protected]>

workaround for optional in proto

cb87843

Signed-off-by: Eurekaaw <[email protected]>

chenzl25 reviewed Feb 8, 2023

View reviewed changes

src/frontend/src/optimizer/rule/top_n_on_index_rule.rs Outdated Show resolved Hide resolved

check

8f352b3

Signed-off-by: Eurekaaw <[email protected]>

replace required to output and include it in the primary key part

aeb41e4

Signed-off-by: Eurekaaw <[email protected]>

BugenZhao reviewed Feb 9, 2023

View reviewed changes

apply optimization iff predicate is always true

e7cb28d

Signed-off-by: Eurekaaw <[email protected]>

chenzl25 approved these changes Feb 9, 2023

View reviewed changes

stdrc reviewed Feb 9, 2023

View reviewed changes

BugenZhao approved these changes Feb 9, 2023

View reviewed changes

Merge branch 'main' into topn_acc

f537a4f

y-wei added the mergify/can-merge label Feb 9, 2023

y-wei and others added 2 commits February 9, 2023 10:23

update planner test

cd261db

Signed-off-by: Eurekaaw <[email protected]>

Merge branch 'main' into topn_acc

260834c

mergify bot merged commit 2d79894 into risingwavelabs:main Feb 9, 2023

y-wei deleted the topn_acc branch February 11, 2023 17:18

This was referenced Feb 16, 2023

feat (optimizer): optimizing TopN on index together with where filter clause #7991

Closed

bug (optimizer): optimized TopN causing fails in create mv #7992

Closed

feat(optimizer): index accelerating TopN #7726

feat(optimizer): index accelerating TopN #7726

Uh oh!

Conversation

y-wei commented Feb 6, 2023 • edited by xxchan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

Uh oh!

y-wei commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

y-wei commented Feb 6, 2023

Uh oh!

chenzl25 left a comment

Choose a reason for hiding this comment

Uh oh!

chenzl25 Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BugenZhao commented Feb 7, 2023

Uh oh!

BugenZhao commented Feb 7, 2023

Uh oh!

y-wei commented Feb 7, 2023

Uh oh!

BugenZhao commented Feb 7, 2023

Uh oh!

codecov bot commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

y-wei Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

y-wei commented Feb 8, 2023

Uh oh!

chenzl25 commented Feb 9, 2023

Uh oh!

y-wei commented Feb 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenzl25 commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenzl25 left a comment

Choose a reason for hiding this comment

Uh oh!

feat(optimizer): index accelerating `TopN` #7726

feat(optimizer): index accelerating `TopN` #7726

y-wei commented Feb 6, 2023 •

edited by xxchan

Loading

y-wei commented Feb 6, 2023 •

edited

Loading

chenzl25 Feb 7, 2023 •

edited

Loading

codecov bot commented Feb 8, 2023 •

edited

Loading

y-wei Feb 8, 2023 •

edited

Loading

chenzl25 commented Feb 9, 2023 •

edited

Loading