Skip to content

[POC]Support type checker For PPL functions #3562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Apr 18, 2025

Description

Calcite has its type checker mechanism to do validation for its function calls. However, this process happens in SqlNode level and its API only accepts SqlNode level objects like SqlCallBinding, SqlNode and etc. Since we use RelNode directly, this validation step has been ignored by our current process.

In order to reuse the definition of Calcite's builtin operators, we have to do validation by ourself with an approach to reuse the definition of Calcite's operators.

Luckily, in Calcite, almost all operators use FamilyOperandTypeChecker as their type checker with API getOperandSqlTypeFamily exposed to us to make use of. We can use this API to do validation conveniently.

Note

This is a draft PR with only one test cases to verify the functionality.

Before this PR, the test CalcitePPLFunctionTypeTest::testLowerWithIntegerType will return a incorrect plan for its PPL:

source=EMP | eval lower_name = lower(EMPNO) | fields lower_name


LogicalProject.NONE.[](input=LogicalTableScan#0,exprs=[LOWER($0)])
     LogicalTableScan.NONE.[0](table=[scott, EMP])

The above plan is incorrect because $0(i.e. EMPNO) has type of integer, so it shouldn't be passed into function LOWER which only allows SqlTypeFamily.CHARACTOR. The error will only be thrown until execution step.

java.lang.RuntimeException: java.sql.SQLException: Error while preparing plan [LogicalProject(lower_name=[LOWER($0)])
  BindableTableScan(table=[[scott, EMP]])
]

With this PR, the ppl will throw exception in building logical plan because of unsupported function.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@qianheng-aws qianheng-aws marked this pull request as draft April 18, 2025 06:54
@qianheng-aws qianheng-aws changed the title Support type checker For PPL functions [POC]Support type checker For PPL functions Apr 18, 2025
import java.util.List;
import org.apache.calcite.rel.type.RelDataType;

/** Function signature is composed by function name and arguments list. */
public record CalciteFuncSignature(FunctionName functionName, List<RelDataType> funcArgTypes) {
public record CalciteFuncSignature(FunctionName functionName, PPLTypeChecker typeChecker) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPLTypeChecker -> FunctionParamsTypeChecker could be more readable.

private static boolean validateOperands(
List<SqlTypeFamily> funcTypeFamilies, List<RelDataType> operandTypes) {
if (funcTypeFamilies.size() != operandTypes.size()) {
return true; // Skip checking if sizes do not match because some arguments may be optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to handle the mismatch case with optional argument

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't do check for the case that has optional arguments, which has the same behavior as the type checker in Calcite. It should throw exception until execution as before.

}
}

class PPLFamilyTypeCheckerWrapper implements PPLTypeChecker {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this PPLFamilyTypeCheckerWrapper used for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports build a PPLTypeChecker from a PPLFamilyTypeCheckerWrapper of Calcite. The latter one only supports SqlNode level check, and we use this wrapper to make it support RelNode level checking per our requirements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe sqlOperandTypeChecker.isOptional(int i) combined with sqlOperandTypeChecker.or(SqlOperandTypeChecker checker) can be used to check functions with multiple signatures?

}
}

public interface FunctionImp2 extends FunctionImp {
List<RelDataType> ANY_TYPE_2 = List.of(ANY_TYPE, ANY_TYPE);
PPLTypeChecker IGNORE_2 = family(IGNORE, IGNORE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the checker name to IGNORE_N?

Copy link
Collaborator Author

@qianheng-aws qianheng-aws Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Calcite, IGNORE is a type like ANY but don't even validate the operand, while ANY accepts all types except CURSOR.

So IGNORE is a more suitable type here than ANY.

return true; // Skip checking if sizes do not match because some arguments may be optional
}
for (int i = 0; i < operandTypes.size(); i++) {
SqlTypeName paramType = operandTypes.get(i).getSqlTypeName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relDataType.getSqlTypeName() does not handle UDT. A function like the following may help:

public static SqlTypeName convertRelDataTypeToSqlTypeName(RelDataType type) {
    if (type instanceof ExprSqlType exprSqlType) {
      return switch (exprSqlType.getUdt()) {
        case EXPR_DATE -> SqlTypeName.DATE;
        case EXPR_TIME -> SqlTypeName.TIME;
        case EXPR_TIMESTAMP -> SqlTypeName.TIMESTAMP;
        default -> type.getSqlTypeName();
      };
    }
    return type.getSqlTypeName();
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These UDT should maps to SqlTypeName.VARCHAR actually. And we have the API to return the related SqlTypeName of our UDT here:

@LantaoJin LantaoJin added the calcite calcite migration releated label Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants