Skip to content

[SPARK-52738][SQL] Support aggregating the TIME type with a UDAF when the underlying buffer is an UnsafeRow #51430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

bersprockets
Copy link
Contributor

What changes were proposed in this pull request?

  • Change BufferSetterGetterUtils to use InternalRow.setLong for setting TIME values rather then InternalRow.update.
  • Change BufferSetterGetterUtils to use InternalRow.getLong for getting TIME values.
  • Update the test "udaf with all data types" in AggregationQuerySuite so that it checks aggregation with both an unsafe and safe aggregation buffer. Since SPARK-41359, that test has been testing with only a safe aggregation buffer.

Why are the changes needed?

When a query uses a UDAF to aggregate a TIME column , and all other columns are "mutable" (as determined by UnsafeRow#isMutable), the aggregator creates an UnsafeRow for the low-level aggregation buffer.

However, the wrapper of that buffer (MutableAggregationBufferImpl) fails to properly set up a field setter function for the TIME column, so it attempts to call UnsafeRow.update on the underlying buffer. The UnsafeRow instance throws org.apache.spark.SparkUnsupportedOperationException:

Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkUnsupportedOperationException: [UNSUPPORTED_CALL.WITHOUT_SUGGESTION] Cannot call the method "update" of the class "org.apache.spark.sql.catalyst.expressions.UnsafeRow".  SQLSTATE: 0A000

See SPARK-52738 for a reproduction example.

Does this PR introduce any user-facing change?

No. The TIME type is not released yet.

How was this patch tested?

Updated a unit test.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jul 9, 2025
// Below we want to test with *both* UnsafeRow and safe row as the underlying
// buffer.
val mutableDataTypes = dataTypes.filter(UnsafeRow.isMutable)
Seq(dataTypes, mutableDataTypes).foreach { dataTypes =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that even without the change to BufferSetterGetterUtils, the unsafe buffer test will sometimes succeed. That's because the inappropriate setter function works fine for nulls and RandomDataGenerator will occasionally give us a null TIME value for the row where id == 50.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this test was succeeding even with TIME types because SPARK-41359 accidentally removed the unsafe buffer part of the test.

@HyukjinKwon
Copy link
Member

cc @MaxGekk

@MaxGekk
Copy link
Member

MaxGekk commented Jul 10, 2025

+1, LGTM. Merging to master.
Thank you, @bersprockets and @HyukjinKwon for the ping.

@MaxGekk MaxGekk closed this in d88298a Jul 10, 2025
@bersprockets bersprockets deleted the time_udaf branch July 11, 2025 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants