Fix execution errors caused by plan gap #3350

qianheng-aws · 2025-02-26T08:49:18Z

Description

This PR fixes several errors caused by Calcite optimization. These issues arise because there are several gaps between our OpenSearch Plan and the Calcite Plan, and the Calcite optimization process is specific to SQL and its own semantic requirements.

This PR includes changes:

Add OpenSearch settings for calcite plan push down and make it disabled by default. Will set default to enabled once all tests works with push down.
Fix error for single column row. Calcite has optimization that transform such row into scalar object and every enumerable operators all follow that convention. In order to combine our scan operators with them, we should adjust to follow it as well.
Fix error for sort or collation behavior of the final results. Calcite doesn't ensure collation of a query if its plan doesn't end with sort operator. For example, it has optimization that will remove the sort inner a subquery since it doesn't ensure the collation outside of that subquery. While for the semantic of our PPL, we expect to ensure collation even a query has other operators(i.e. project) after sort command.
However, we don't plan to hack the prepare process of Calcite, the only thing we can do is transform our plan to conform to Calcite conventions. For this case, before optimization, we should add a sort operator as the root of our plan if the original root has collation for its output data. And in optimization, we rely on calcite optimizer to remove redundant sort operators.
Fix error for pushdown by lazily construct OpenSearchRequestBuilder. The previous implementation has a problem in optimization process that scan operators in different equivalent plans may share the same objects in OpenSearchRequestBuilder, since OpenSearchRequestBuilder cannot do deep copy. It has issue that transformation on one plan may affect other plans while it's never allowed.
To fix this, we should lazily construct OpenSearchRequestBuilder and maintain these push down actions in a queue, until the final implementation stage. Since queue supports deep copy and each push down actions are all enclosure, we can avoid the above issue.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <[email protected]>

…ine' into feature/calcite-engine-sort-fix

LantaoJin · 2025-02-26T09:06:02Z

core/src/main/java/org/opensearch/sql/executor/QueryService.java

+     */
+    RelCollation collation = osPlan.getTraitSet().getCollation();
+    if (!(osPlan instanceof Sort) && collation != RelCollations.EMPTY) {
+      calcitePlan = LogicalSort.create(osPlan, collation, null, null);


Any reason for adding this NotNull annotation? Would be better to add this?

if (calcitePlan = null) return osPlan;

Nothing special, it's auto-generated by IDEA. We can remove it.

While we expect that the collation can be preserved through the pipes over PPL

could u elberate more on this? i do not think PPL command should preserved order.

@qianheng-aws @LantaoJin
I seem, some PPL command should preserved order, for instance, fields, take, but others not, stats.
In ANSI SQL, select a from tbl order by b, order is perserved becuase of logically processing order of select statement is acutall from -> select -> order.

LantaoJin · 2025-02-26T09:08:45Z

.../main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchFilterIndexScanRule.java

+    CalciteOpenSearchIndexScan newScan = scan.pushDownFilter(filter);
+    if (newScan != null) {


newScan won't be used after this check. the name newScan confused me.
could you change it to

if (scan.pushDownFilter(filter) != null) {

Sorry, it's typo. It happens the current logic works as well.

call.transformTo(scan); -> call.transformTo(newScan);

LantaoJin · 2025-02-26T09:13:11Z

...rch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteOpenSearchIndexScan.java

@@ -56,24 +59,31 @@ public class CalciteOpenSearchIndexScan extends OpenSearchTableScan {
   */
  public CalciteOpenSearchIndexScan(
      RelOptCluster cluster, RelOptTable table, OpenSearchIndex index) {
-    this(cluster, table, index, index.createRequestBuilder(), table.getRowType());
+    this(cluster, table, index, table.getRowType(), null);


this(cluster, table, index, table.getRowType(), new PushDownContext());
and you can remove L74

LantaoJin · 2025-02-26T09:20:17Z

...rch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteOpenSearchIndexScan.java

@@ -118,17 +138,18 @@ public Enumerator<Object> enumerator() {
    };
  }

-  public boolean pushDownFilter(Filter filter) {
+  public CalciteOpenSearchIndexScan pushDownFilter(Filter filter) {


why change the return type? seems the return value is never in use besides the null-checker in OpenSearchFilterIndexScanRule

LantaoJin · 2025-02-26T09:22:09Z

...rch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteOpenSearchIndexScan.java

    return newScan;
  }
+
+  static class PushDownContext extends ArrayDeque<PushDownAction> {


Could you add some comments about this context. such as purpose and usage

LantaoJin · 2025-02-26T09:24:57Z

...arch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java

+    if (fields.size() == 1) {
+      return current.tupleValue().get(fields.getFirst()).valueForCalcite();
+    }
+    return fields.stream().map(k -> current.tupleValue().get(k).valueForCalcite()).toArray();


The number of fields here has been reduced to the actual number of outputs, right? So get(k) won't return null anymore

I've pushed the workaround code to dev branch mistakenly. could you merge with the latest branch code to resolve conflicts @qianheng-aws

penghuo · 2025-02-26T18:25:42Z

core/src/main/java/org/opensearch/sql/executor/QueryService.java

+     */
+    RelCollation collation = osPlan.getTraitSet().getCollation();
+    if (!(osPlan instanceof Sort) && collation != RelCollations.EMPTY) {
+      calcitePlan = LogicalSort.create(osPlan, collation, null, null);


While we expect that the collation can be preserved through the pipes over PPL

could u elberate more on this? i do not think PPL command should preserved order.

penghuo · 2025-02-26T18:30:06Z

...rch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteOpenSearchIndexScan.java

@@ -85,8 +95,10 @@ public RelNode copy(RelTraitSet traitSet, List<RelNode> inputs) {
  @Override
  public void register(RelOptPlanner planner) {
    super.register(planner);
-    for (RelOptRule rule : OpenSearchIndexRules.OPEN_SEARCH_INDEX_SCAN_RULES) {
-      planner.addRule(rule);
+    if (osIndex.getSettings().getSettingValue(Settings.Key.CALCITE_PUSHDOWN_ENABLED)) {


is it for test purpose, right? I do not think end-user has any reasons to disable optimization.

For example, PPL source=table | sort a | field a will have plan scan -> sort -> project, we expect the final result should have collation on column a.
However, calcite won't ensure that. If we translate the above PPL or plan into SQL, it should be select a from (select * from table order by a). In Calcite, it doesn't ensure collation inner a subquery and will have several optimization to remove the sort operator.

[remove sort in subquery when converting to RelNode] https://github.com/apache/calcite/blob/a94927e9b80f9f5bf639e31c2636536cb6aebc1a/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L6454C36-L6454C58

[set root collation to be empty if top operator is not sort, then the optimizer won't choose a plan having sort operator since it will cost more while we don't have collation requirement for root] https://github.com/apache/calcite/blob/a94927e9b80f9f5bf639e31c2636536cb6aebc1a/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L1055

If we do want to ensure collation of the final result, the PPL should be source=table | field a | sort a, which can be translated to SQL select a from table order by a. That's why we need add sort operator to ensure collation and rely on optimizer to eliminate the redundant sort for the consideration of simplicity.

Yeah, I think so. It's per @LantaoJin's suggestion.

And it allows us to disable the push down feature quickly without rollback if there are some unexpected issues.

@penghuo Adding this configuration is by my suggestion. This is an advanced configuration aimed at developers. Similar to how databases typically have switches for optimization rules. Moreover, this configuration is currently very useful for our development - in some scenarios, enabling push-down might cause issues, and having this switch helps us determine whether problems are caused by push-down or other factors.

The optimal solution would be to provide a method that allows free selection of optimization rules. However, considering there aren't many custom optimization rules in the short term, adding a push-down config seems to be the most cost-effective approach.

got it, make sense, if Settings.Key.CALCITE_PUSHDOWN_ENABLED default value is enabled, should be fine. let's consider remove it when we are confident.

Signed-off-by: Heng Qian <[email protected]>

…ine' into feature/calcite-engine-sort-fix # Conflicts: # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java

qianheng-aws · 2025-02-27T11:41:33Z

./gradlew :integ-test:integTest --tests '*Calcite*IT' successfully on my local. Push down feature has some issues, disabled it to unblock our tests. Will enable once all tests work with it.

* Transform to calcite plan before executing Signed-off-by: Heng Qian <[email protected]> * Fix bug for single column row Signed-off-by: Heng Qian <[email protected]> * Add settings for calcite pushdown Signed-off-by: Heng Qian <[email protected]> * Lazily construct OpenSearchRequestBuilder and do push down Signed-off-by: Heng Qian <[email protected]> * Address comments and disable push down Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]>

* Transform to calcite plan before executing Signed-off-by: Heng Qian <[email protected]> * Fix bug for single column row Signed-off-by: Heng Qian <[email protected]> * Add settings for calcite pushdown Signed-off-by: Heng Qian <[email protected]> * Lazily construct OpenSearchRequestBuilder and do push down Signed-off-by: Heng Qian <[email protected]> * Address comments and disable push down Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]> Signed-off-by: xinyual <[email protected]>

qianheng-aws added 4 commits February 26, 2025 14:12

Transform to calcite plan before executing

19732b3

Signed-off-by: Heng Qian <[email protected]>

Fix bug for single column row

378d165

Signed-off-by: Heng Qian <[email protected]>

Add settings for calcite pushdown

4698600

Signed-off-by: Heng Qian <[email protected]>

Lazily construct OpenSearchRequestBuilder and do push down

efc8cd8

Signed-off-by: Heng Qian <[email protected]>

qianheng-aws requested review from ps48, kavithacm, derek-ho, joshuali925, dai-chen, YANG-DB, mengweieric, Swiddis, penghuo, seankao-az, MaxKsyunz, Yury-Fridlyand, anirudha, forestmvey, acarbonetto, GumpacG, ykmr1224, LantaoJin and noCharger as code owners February 26, 2025 08:49

Merge remote-tracking branch 'refs/remotes/origin/feature/calcite-eng…

2de4f5c

…ine' into feature/calcite-engine-sort-fix

LantaoJin reviewed Feb 26, 2025

View reviewed changes

penghuo added the calcite calcite migration releated label Feb 26, 2025

penghuo reviewed Feb 26, 2025

View reviewed changes

qianheng-aws added 2 commits February 27, 2025 19:30

Address comments and disable push down

6573ab7

Signed-off-by: Heng Qian <[email protected]>

Merge remote-tracking branch 'refs/remotes/origin/feature/calcite-eng…

71f853e

…ine' into feature/calcite-engine-sort-fix # Conflicts: # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java

LantaoJin merged commit 2971eae into opensearch-project:feature/calcite-engine Feb 27, 2025
4 of 13 checks passed

qianheng-aws mentioned this pull request Mar 4, 2025

[BugFix] support push down text field correctly. #3376

Merged

7 tasks

		CalciteOpenSearchIndexScan newScan = scan.pushDownFilter(filter);
		if (newScan != null) {

Fix execution errors caused by plan gap #3350

Fix execution errors caused by plan gap #3350

Uh oh!

Conversation

qianheng-aws commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qianheng-aws commented Feb 27, 2025

Uh oh!

Uh oh!

Uh oh!

qianheng-aws commented Feb 26, 2025 •

edited

Loading