Skip to content

Initializes PType #1488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2024
Merged

Initializes PType #1488

merged 6 commits into from
Jun 20, 2024

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Jun 15, 2024

Description

  • Initializes PType, the representation of types of PartiQL values (to be used by UDFs, DDL, Shape, function resolution, static analysis, and gradual typing).
  • Taking our learnings from PartiQL Shape POC, Apache Calcite's RelDataType, Ion's AnyElement, PostgreSQL's Datum, and PartiQL's Datum, this document proposes furthering our own control of our type system by modeling our types in similar manner.
  • The guiding principle that distinguishes PType from PartiQLValueType and StaticType is a notion of what I'll call Core Type Kinds. The "Kind" of a type shall dictate what methods are available to it.
  • PType is now used as UDF function parameters as well as the return type. This allows for parameterization of our types. Now, we can write functions like:
CREATE FUNCTION ABS(arg DECIMAL(20, 4)) RETURNS DECIMAL(20, 4);

Other Changes

  • What else has changed?
    • Coercions are now dictated by Coercions.kt. This uses the rules directly from SQL:1999 Section 4.12 and is extended to include PartiQL structs, bags, lists, and s-expressions.
  • I've simplified the function resolution to use PostgreSQL-style function resolution. This still passes all of our existing tests.
    • I decided to not implement this using the "preferred types". This, in my opinion, led to some strange results.
  • Is there a reversion of functionality from V1? From main?
    • We don't support BINARY, BYTE, and INTERVAL. This is not supported in main and it is not in SQL:1999. Therefore, it has been left out of this PR.
    • There is no NULL or MISSING type. Please see the applicable design document for guidance on this topic.
    • For typing the plan, I use a CompilerType. This is an IR that wraps the underlying PType while retaining information such as isNullValue and isMissingValue to denote that these expressions will always return null/missing.
  • With the introduction of the UNKNOWN type, we can now defer typing of literal null and missing values. You will see this with DynamicTyper which now returns a list of both coercions and replacement expressions. The replacements are specifically to convert "unknown" typed null and missing literals to typed null and missing literals.
    • Now, for example, when users compiled 1 + NULL this used to get compiled to <int32> + <any> = <any> resulting in a dynamic call (with all possible plus implementations to be dynamically invoked). This is not only computationally expensive, but it also cedes the right to type our literal nulls and resolve to a single static function. Hence, our plan would look like:
      • Call.Dynamic(Candidates = [ Call.Static("PLUS", Lit.Int32(1), Cast(Lit.Null(), INT32)), ... ] )
    • Now, this initially gets typed as <int32> + <unknown>. Using PostgreSQL's function resolution logic, this would get further typed to: <int32> + <int32> = <int32>. The plan would look something like:
      • Call.Static("PLUS", Lit.Int32(1), Lit.Int32.Null())
    • Notice that the literal null is replaced with a null of a specific type. This leads to faster execution and less overhead.

Reviewing

  • For reviewers, I'd recommend focusing on the API changes in:
    • org.partiql.types.PType (Java)
    • org.partiql.eval.value.Datum (Java)
    • org.partiql.spi.connector.ConnectorObject (Kotlin)

In general, I'd look at the .api files to understand how these interfaces will be used by the public.

I'd also take a keen eye to the testing. See the Testing section below.

Testing

  • For full transparency, reviewers should take a look at PType.fromStaticType which is used extensively in testing. For AnyOf types, one should be aware that this is typed as DYNAMIC. "You can't escape dynamic."
  • On the same note, anything interacting with the dynamic type results in dynamic (unless specially handled for). As such, 1 + <dynamic> = <dynamic>. This is different from our previous implementation which would compute all possible permutations of the types and union the resulting type. Reviewers will notice, however, that the tests haven't changed very much. This is due to the above bullet point. A union of two types is implicitly the dynamic type and therefore the tests have not needed to be modified much.

User Experience Impact

  • This implementation should provide performance benefits. Instead of checking JVM types and casting where necessary, all methods are exposed via the PType fat interface. This enforces the use of our own type semantics without using the overhead of Java types.
  • That being said, users must be aware of the APIs consumed. See the Javadocs for PType for user-guidance.

Future Changes

The following items should be addressed further before release of V1.0:

  • The Cast Table should be updated. There is no need for safeness to be represented in the plan.
  • In my opinion, we should rely on PType to create a literal collection in evaluation. This should be an enum, or a type constructor reference.
  • The Problems surfaced by the ProblemGenerator should hold PTypes, not StaticType. It should be an easy change.
  • All plan nodes with type information should rename the type attribute to be type rather than static_type.

Other Information

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (v1@ed882f7). Learn more about missing BASE report.

Additional details and impacted files
@@          Coverage Diff          @@
##             v1    #1488   +/-   ##
=====================================
  Coverage      ?   80.07%           
  Complexity    ?       47           
=====================================
  Files         ?       19           
  Lines         ?      507           
  Branches      ?       23           
=====================================
  Hits          ?      406           
  Misses        ?       88           
  Partials      ?       13           
Flag Coverage Δ
EXAMPLES 80.07% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnedquinn johnedquinn marked this pull request as ready for review June 17, 2024 00:30
@@ -130,7 +130,7 @@ internal class ExprCallDynamic(

init {
val lookupsMutable = mutableListOf<CandidateIndex>()
val accumulator = mutableListOf<Pair<List<PartiQLValueType>, Candidate>>()
val accumulator = mutableListOf<Pair<List<PType>, Candidate>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, why not the enum? We now could have different accumulators for different types even though this is not how the candidate resolution works.

char(3) and char(4) would not be different accumulators, but they might be backed by a candidate with signature foo(<string>)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK -- currently the function definitions have PType as the parameter types. Once this is changed (I know you're actively working on this), this can be changed accordingly.

@@ -14,7 +13,7 @@ internal class ExprCoalesce(
override fun eval(env: Environment): Datum {
for (arg in args) {
val result = arg.eval(env)
if (!result.isNull && result.type != PartiQLValueType.MISSING) {
if (!result.isNull && !result.isMissing) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we have an isAbsent now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR is specific to PType, I'll defer this addition until later if you agree.

Adds experimental, deprecation notices

Updates Datum implementations
Updates factory method to creat PType.Kind.CHAR
@johnedquinn
Copy link
Member Author

I've made a few updates since the first revision including:

  • Updating all PType.Kind Javadocs
  • Adding deprecation/experimental notices on SEXP, CLOB, BLOB, SYMBOL, the arbitrary int and decimal
  • Adding VARCHAR
  • Removed the redundant type assertions (factory is sufficient).

I've also created an issue to track items discussed offline: #1489. This issue should not block this merge.

Comment on lines +49 to 59
// 2. Discard functions that cannot be matched (via implicit coercion or exact matches)
var matches = match(candidates, args).ifEmpty { return null }
if (matches.size == 1) {
return matches.first().match
}

// Order based on original candidate function ordering
val orderedUniqueMatches = matches.toSet().toList()
val orderedCandidates = candidates.flatMap { candidate ->
orderedUniqueMatches.filter { it.signature == candidate }
// 3. Run through all candidates and keep those with the most exact matches on input types.
matches = matchOn(matches) { it.numberOfExactInputTypes }
if (matches.size == 1) {
return matches.first().match
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but this should be one loop. out-of-scope for the PR

Comment on lines +103 to +111
when (toCompare(candidate).compareTo(mostExactMatches)) {
-1 -> continue
0 -> matches.add(candidate)
1 -> {
mostExactMatches = toCompare(candidate)
matches.clear()
matches.add(candidate)
}
else -> error("CompareTo should never return outside of range [-1, 1]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but we could just have

val n = toCompare(candidate).compareTo(mostExactMatches)
when {
  n < 0 -> continue
  n == 0 -> ...
  n > 0 -> ...
}

to eliminate the need for an else

Comment on lines +6 to +9
/**
* This is largely just to show that the planner does not need to use [_delegate] ([PType]) directly. Using an
* internal representation, we can leverage the APIs of [PType] while carrying some additional information such
* as [isMissingValue].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was trying to get at with a PartiQLType (public) -> PType (internal) thing with a higher class hierarchy, but a fat internal. To be experimented with..

@johnedquinn johnedquinn merged commit 4dd0972 into partiql:v1 Jun 20, 2024
7 checks passed
@johnedquinn johnedquinn deleted the v1-ptype branch July 5, 2024 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants