Skip to content

Fix execution errors caused by plan gap #3350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Feb 26, 2025

Description

This PR fixes several errors caused by Calcite optimization. These issues arise because there are several gaps between our OpenSearch Plan and the Calcite Plan, and the Calcite optimization process is specific to SQL and its own semantic requirements.

This PR includes changes:

  1. Add OpenSearch settings for calcite plan push down and make it disabled by default. Will set default to enabled once all tests works with push down.

  2. Fix error for single column row. Calcite has optimization that transform such row into scalar object and every enumerable operators all follow that convention. In order to combine our scan operators with them, we should adjust to follow it as well.

  3. Fix error for sort or collation behavior of the final results. Calcite doesn't ensure collation of a query if its plan doesn't end with sort operator. For example, it has optimization that will remove the sort inner a subquery since it doesn't ensure the collation outside of that subquery. While for the semantic of our PPL, we expect to ensure collation even a query has other operators(i.e. project) after sort command.
    However, we don't plan to hack the prepare process of Calcite, the only thing we can do is transform our plan to conform to Calcite conventions. For this case, before optimization, we should add a sort operator as the root of our plan if the original root has collation for its output data. And in optimization, we rely on calcite optimizer to remove redundant sort operators.

  4. Fix error for pushdown by lazily construct OpenSearchRequestBuilder. The previous implementation has a problem in optimization process that scan operators in different equivalent plans may share the same objects in OpenSearchRequestBuilder, since OpenSearchRequestBuilder cannot do deep copy. It has issue that transformation on one plan may affect other plans while it's never allowed.
    To fix this, we should lazily construct OpenSearchRequestBuilder and maintain these push down actions in a queue, until the final implementation stage. Since queue supports deep copy and each push down actions are all enclosure, we can avoid the above issue.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

*/
RelCollation collation = osPlan.getTraitSet().getCollation();
if (!(osPlan instanceof Sort) && collation != RelCollations.EMPTY) {
calcitePlan = LogicalSort.create(osPlan, collation, null, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for adding this NotNull annotation? Would be better to add this?

if (calcitePlan = null) return osPlan;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing special, it's auto-generated by IDEA. We can remove it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we expect that the collation can be preserved through the pipes over PPL

could u elberate more on this? i do not think PPL command should preserved order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qianheng-aws @LantaoJin
I seem, some PPL command should preserved order, for instance, fields, take, but others not, stats.
In ANSI SQL, select a from tbl order by b, order is perserved becuase of logically processing order of select statement is acutall from -> select -> order.

Comment on lines +45 to +46
CalciteOpenSearchIndexScan newScan = scan.pushDownFilter(filter);
if (newScan != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newScan won't be used after this check. the name newScan confused me.
could you change it to

if (scan.pushDownFilter(filter) != null) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it's typo. It happens the current logic works as well.

call.transformTo(scan);
  
->

call.transformTo(newScan);

@@ -56,24 +59,31 @@ public class CalciteOpenSearchIndexScan extends OpenSearchTableScan {
*/
public CalciteOpenSearchIndexScan(
RelOptCluster cluster, RelOptTable table, OpenSearchIndex index) {
this(cluster, table, index, index.createRequestBuilder(), table.getRowType());
this(cluster, table, index, table.getRowType(), null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this(cluster, table, index, table.getRowType(), new PushDownContext());
and you can remove L74

@@ -118,17 +138,18 @@ public Enumerator<Object> enumerator() {
};
}

public boolean pushDownFilter(Filter filter) {
public CalciteOpenSearchIndexScan pushDownFilter(Filter filter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the return type? seems the return value is never in use besides the null-checker in OpenSearchFilterIndexScanRule

return newScan;
}

static class PushDownContext extends ArrayDeque<PushDownAction> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments about this context. such as purpose and usage

if (fields.size() == 1) {
return current.tupleValue().get(fields.getFirst()).valueForCalcite();
}
return fields.stream().map(k -> current.tupleValue().get(k).valueForCalcite()).toArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of fields here has been reduced to the actual number of outputs, right? So get(k) won't return null anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the workaround code to dev branch mistakenly. could you merge with the latest branch code to resolve conflicts @qianheng-aws

@penghuo penghuo added the calcite calcite migration releated label Feb 26, 2025
*/
RelCollation collation = osPlan.getTraitSet().getCollation();
if (!(osPlan instanceof Sort) && collation != RelCollations.EMPTY) {
calcitePlan = LogicalSort.create(osPlan, collation, null, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we expect that the collation can be preserved through the pipes over PPL

could u elberate more on this? i do not think PPL command should preserved order.

@@ -85,8 +95,10 @@ public RelNode copy(RelTraitSet traitSet, List<RelNode> inputs) {
@Override
public void register(RelOptPlanner planner) {
super.register(planner);
for (RelOptRule rule : OpenSearchIndexRules.OPEN_SEARCH_INDEX_SCAN_RULES) {
planner.addRule(rule);
if (osIndex.getSettings().getSettingValue(Settings.Key.CALCITE_PUSHDOWN_ENABLED)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it for test purpose, right? I do not think end-user has any reasons to disable optimization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For example, PPL source=table | sort a | field a will have plan scan -> sort -> project, we expect the final result should have collation on column a.
    However, calcite won't ensure that. If we translate the above PPL or plan into SQL, it should be select a from (select * from table order by a). In Calcite, it doesn't ensure collation inner a subquery and will have several optimization to remove the sort operator.

If we do want to ensure collation of the final result, the PPL should be source=table | field a | sort a, which can be translated to SQL select a from table order by a. That's why we need add sort operator to ensure collation and rely on optimizer to eliminate the redundant sort for the consideration of simplicity.

  1. Yeah, I think so. It's per @LantaoJin's suggestion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. And it allows us to disable the push down feature quickly without rollback if there are some unexpected issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penghuo Adding this configuration is by my suggestion. This is an advanced configuration aimed at developers. Similar to how databases typically have switches for optimization rules. Moreover, this configuration is currently very useful for our development - in some scenarios, enabling push-down might cause issues, and having this switch helps us determine whether problems are caused by push-down or other factors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimal solution would be to provide a method that allows free selection of optimization rules. However, considering there aren't many custom optimization rules in the short term, adding a push-down config seems to be the most cost-effective approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, make sense, if Settings.Key.CALCITE_PUSHDOWN_ENABLED default value is enabled, should be fine. let's consider remove it when we are confident.

…ine' into feature/calcite-engine-sort-fix

# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java
@qianheng-aws
Copy link
Collaborator Author

./gradlew :integ-test:integTest --tests '*Calcite*IT' successfully on my local. Push down feature has some issues, disabled it to unblock our tests. Will enable once all tests work with it.

@LantaoJin LantaoJin merged commit 2971eae into opensearch-project:feature/calcite-engine Feb 27, 2025
4 of 13 checks passed
penghuo pushed a commit to penghuo/os-sql that referenced this pull request Mar 12, 2025
* Transform to calcite plan before executing

Signed-off-by: Heng Qian <[email protected]>

* Fix bug for single column row

Signed-off-by: Heng Qian <[email protected]>

* Add settings for calcite pushdown

Signed-off-by: Heng Qian <[email protected]>

* Lazily construct OpenSearchRequestBuilder and do push down

Signed-off-by: Heng Qian <[email protected]>

* Address comments and disable push down

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
penghuo pushed a commit that referenced this pull request Jun 16, 2025
* Transform to calcite plan before executing

Signed-off-by: Heng Qian <[email protected]>

* Fix bug for single column row

Signed-off-by: Heng Qian <[email protected]>

* Add settings for calcite pushdown

Signed-off-by: Heng Qian <[email protected]>

* Lazily construct OpenSearchRequestBuilder and do push down

Signed-off-by: Heng Qian <[email protected]>

* Address comments and disable push down

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: xinyual <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants