Skip to content

Iceberg unit tests, support Iceberg + nonhive catalogs, Iceberg Kryo Serializer #993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

abbywh
Copy link
Contributor

@abbywh abbywh commented May 15, 2025

Summary

We want to unit test with Iceberg test via CI as well as improve the support in the Chronon OSS package.

Why / Goal

Follow Ups

Test Plan

  • [ x ] Added Unit Tests
  • [ x ] Covered by existing CI
  • [ x ] Integration tested

Added circleCI check

Checklist

  • [ N/A] Documentation update

Reviewers

@abbywh abbywh mentioned this pull request May 15, 2025
4 tasks
@abbywh abbywh changed the title Iceberg unit tests, support Iceberg + nonhive catalogs Iceberg unit tests, support Iceberg + nonhive catalogs, Iceberg Kryo Serializer May 16, 2025
@abbywh abbywh marked this pull request as ready for review May 16, 2025 18:41
Copy link
Collaborator

@nikhil-zlai nikhil-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clean!

@@ -10,6 +10,11 @@
*.logs
*.iml
.idea/
*.jvmopts
.bloop*
.metals*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is working with metals relative to intellij? does the debugger work as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really good actually. The debugged worked out of the box, I found it comparable to IntelliJ overall.

I'd recommend it to anyone who has remote dev boxes since VSCode's integration is far better in my experience. All the tests run a lot faster and I got in way more dev cycles. I probably would only recommend over IntelliJ with a dev box though.

@@ -239,9 +239,15 @@ case object Iceberg extends Format {
override def partitions(tableName: String, partitionColumns: Seq[String])(implicit
sparkSession: SparkSession): Seq[Map[String, String]] = {
sparkSession.sqlContext
.sql(s"SHOW PARTITIONS $tableName")
.sql(s"SELECT partition FROM $tableName" ++ ".partitions")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc does this work for regular hive tables?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for iceberg, Hive support is here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should work for hive tables, and the internal target I'm hitting are more or less "regular hive tables". Iceberg abstracts itself from the catalog implementation, so as long as your iceberg has an interface to your catalog implementation, it will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants