Skip to content

feat: basic hudi support #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 27, 2025
Merged

feat: basic hudi support #437

merged 2 commits into from
Feb 27, 2025

Conversation

tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Feb 26, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Chores

    • Consolidated and streamlined build dependencies for improved integration with AWS services and data processing libraries.
    • Expanded the set of supported third-party libraries, including new artifacts for enhanced performance and compatibility.
    • Added new dependencies for Hudi, Jackson, and Zookeeper to enhance functionality.
    • Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to broaden available functionalities.
  • Tests

    • Added a new test class to verify reliable write/read operations on Hudi tables using a Spark session.
  • Refactor

    • Enhanced serialization registration to support a broader range of data types, improving overall processing stability.
    • Introduced a new variable for shared library dependencies to simplify dependency management.

@tchow-zlai tchow-zlai changed the title wip feat: basic hudi support Feb 26, 2025
Copy link

coderabbitai bot commented Feb 26, 2025

Walkthrough

The changes consolidate dependency declarations in the cloud_aws/BUILD.bazel file by introducing a new variable shared_libs and updating test configurations. A new test class, GlueCatalogTest, is added to validate Hudi table write/read operations. Additionally, the Kryo registrator is enhanced with new Hudi and Scala class registrations. New Maven artifacts have been incorporated into build rule files for both Scala 2.12 and 2.13 repositories.

Changes

File(s) Change Summary
cloud_aws/BUILD.bazel Introduced shared_libs; updated deps for cloud_aws_lib and scala_test_suite; modified test_deps (removed several deps, added DynamoDBLocal and Mockito); removed test_lib.
cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala Added new test class GlueCatalogTest to verify Hudi table write/read operations via Spark.
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala Extended the registerClasses method with additional Hudi and Scala classes to improve Kryo serialization.
tools/build_rules/dependencies/maven_repository.bzl Added new Maven artifacts (Jackson Afterburner, Hudi AWS bundle, Parquet Avro, Zookeeper) in maven_repository.bzl.
tools/build_rules/dependencies/scala_2_12_repository.bzl Added new Hudi Spark bundle for Scala 2.12.
tools/build_rules/dependencies/scala_2_13_repository.bzl Added new Hudi Spark bundle for Scala 2.13.

Suggested reviewers

  • nikhil-zlai
  • piyush-zlai

Poem

In the realm of code, a change takes flight,
Shared dependencies now shining bright.
Kryo registers new friends along the way,
And tests confirm that Hudi’s here to stay.
A cheerful push with bytes so neat, 🚀
Celebrating progress in every commit!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f809912 and 9fb837c.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2)

172-174: Added new Scala type for serialization.

Added ArraySeq$ofInt to support Hudi operations.


195-218: New Hudi Kryo registrator looks good.

Follows same pattern as DeltaLake registrator. Registers essential Hudi classes for serialization.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tchow-zlai tchow-zlai marked this pull request as draft February 26, 2025 02:07
Copy link
Contributor

@piyush-zlai piyush-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need any updates to the table format providers and friends?

@tchow-zlai
Copy link
Collaborator Author

Do we need any updates to the table format providers and friends?

Yep absolutely - will do that in a followup. This is just to get the jars here so we can run unit tests.

@tchow-zlai tchow-zlai changed the base branch from tchow/iceberg-reads-writes to main February 27, 2025 05:40
@tchow-zlai tchow-zlai force-pushed the tchow/hudi-reads-writes branch 2 times, most recently from 6adc9ee to de5cf0e Compare February 27, 2025 05:41
@tchow-zlai tchow-zlai marked this pull request as ready for review February 27, 2025 05:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala (1)

10-36: Test structure looks good but consider adding cleanup.

The test correctly verifies basic Hudi functionality with local metastore, but doesn't clean up the test table after execution.

Consider adding cleanup code to remove the test table after verification:

  "basic round trip hudi table" should "work with local metastore" in {
    import spark.implicits._

    val input = Set(1,2,3,4)
    val sourceDF = spark.sparkContext.parallelize(input.toSeq).toDF("id")

    sourceDF.write.format("hudi")
      .mode(SaveMode.Overwrite)
      .saveAsTable("test_hudi_table")

    val back = spark.table("test_hudi_table").select("id").as[Int].collect()
    assertEquals(input, back.toSet)

+   // Clean up
+   spark.sql("DROP TABLE IF EXISTS test_hudi_table")
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa8118 and c1c6b44.

📒 Files selected for processing (6)
  • cloud_aws/BUILD.bazel (2 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (3 hunks)
  • tools/build_rules/dependencies/scala_2_12_repository.bzl (1 hunks)
  • tools/build_rules/dependencies/scala_2_13_repository.bzl (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
🔇 Additional comments (11)
tools/build_rules/dependencies/scala_2_12_repository.bzl (1)

41-41: Adding Hudi dependency looks good.

The added dependency for Hudi Spark bundle is properly placed and will enable Hudi support for Scala 2.12.

tools/build_rules/dependencies/scala_2_13_repository.bzl (1)

41-41: Adding Hudi dependency looks good.

The added dependency for Hudi Spark bundle is properly placed and will enable Hudi support for Scala 2.13.

spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

173-188: Kryo registrations for Hudi classes look good.

All necessary Hudi model and utility classes are registered for proper serialization. Well organized.

cloud_aws/BUILD.bazel (5)

1-20: Good refactoring of dependencies.

Centralizing shared dependencies improves maintainability.


30-30: Clean dependency reference.

Using shared_libs variable simplifies maintenance.


49-49: Simplified test dependencies structure.

Combining shared_libs with test-specific dependencies provides clear organization.


13-13:

❓ Verification inconclusive

Confirm Hudi Spark bundle compatibility.

Verify Spark version compatibility with this Hudi bundle.


🏁 Script executed:

#!/bin/bash
# Look for Spark version declarations
rg -A 1 "spark.*version" --type bazel

Length of output: 39


Action: Manually verify Spark version compatibility.
The search didn't reveal any Spark version declarations. Please confirm that the Hudi Spark3 bundle in cloud_aws/BUILD.bazel is compatible with your project's Spark version by checking other config files or documentation.


9-9:

❓ Verification inconclusive

Verify Hudi AWS bundle compatibility.

Ensure this version is compatible with your AWS environment.


🏁 Script executed:

#!/bin/bash
# Check for Hudi compatibility issues
rg -A 2 -B 2 "hudi" --type scala

Length of output: 4518


Action: Validate the Hudi AWS bundle version

  • In cloud_aws/BUILD.bazel (line 9), the Maven artifact for org.apache.hudi:hudi-aws-bundle is declared.
  • The tests in GlueCatalogTest.scala and the registrations in ChrononKryoRegistrator.scala confirm Hudi usage.
  • Please verify that the specified bundle version is fully compatible with our AWS integrations.
tools/build_rules/dependencies/maven_repository.bzl (3)

27-27: Performance enhancement with jackson-module-afterburner.

This module improves Jackson serialization performance.


47-47: Core dependency for Hudi AWS support.

This bundle is essential for the PR objective.


131-132:

❓ Verification inconclusive

Dependency exclusions may need documentation.

Clarify why parquet-avro and zookeeper are excluded.


🏁 Script executed:

#!/bin/bash
# Check for potential transitive dependencies that might include these
rg -A 5 "parquet|zookeeper" --type scala

Length of output: 25547


Clarify exclusion rationale in maven_repository.bzl
Please add a brief comment near lines 131–132 in tools/build_rules/dependencies/maven_repository.bzl explaining why these two exclusions—org.apache.parquet:parquet-avro and org.apache.zookeeper:zookeeper—are necessary (e.g. to avoid transitive dependency conflicts or reduce unnecessary overhead). The grep results indicate that "parquet" is widely used in our codebase, so noting the specific conflict or legacy issue would be very helpful.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

197-214: Rename variable to match content type

Variable name refers to Delta but contains Hudi classes.

-    val additionalDeltaNames = Seq(
+    val additionalHudiNames = Seq(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c1c6b44 and ad75bca.

📒 Files selected for processing (2)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala
🧰 Additional context used
🧠 Learnings (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: non_spark_tests
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2)

172-173: LGTM!

Added LongManifest for serialization.


194-217: LGTM!

Clean implementation of Hudi Kryo registrator.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

194-215: Rename variable for clarity.

Variable name should reflect Hudi context.

-    val additionalDeltaNames = Seq(
+    val additionalHudiNames = Seq(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ad75bca and 8610dec.

📒 Files selected for processing (2)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala
🧰 Additional context used
🧠 Learnings (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: join_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2)

172-173: LGTM: Addition of LongManifest class.

Adding this class ensures complete Kryo serialization coverage.


194-217: LGTM: Well-structured Hudi serialization support.

Follows established pattern from DeltaLake implementation.

@tchow-zlai tchow-zlai force-pushed the tchow/hudi-reads-writes branch from 6d7a096 to 5fe3159 Compare February 27, 2025 16:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

172-173: Duplicate registration detected.

scala.reflect.ManifestFactory$LongManifest is already registered at line 159.

 "org.apache.datasketches.kll.KllHeapFloatsSketch",
 "org.apache.datasketches.kll.KllSketch$SketchStructure",
 "org.apache.datasketches.kll.KllSketch$SketchType",
-"scala.reflect.ManifestFactory$LongManifest"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 6d7a096 and 5fe3159.

📒 Files selected for processing (6)
  • cloud_aws/BUILD.bazel (2 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (3 hunks)
  • tools/build_rules/dependencies/scala_2_12_repository.bzl (1 hunks)
  • tools/build_rules/dependencies/scala_2_13_repository.bzl (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/GlueCatalogTest.scala
  • tools/build_rules/dependencies/maven_repository.bzl
  • tools/build_rules/dependencies/scala_2_12_repository.bzl
  • tools/build_rules/dependencies/scala_2_13_repository.bzl
🧰 Additional context used
🧠 Learnings (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: non_spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (6)
cloud_aws/BUILD.bazel (5)

1-20: New shared_libs variable improves dependency management.

Consolidates common dependencies for reuse across targets.


9-9: Hudi dependencies added to support basic Hudi functionality.

Consistent with PR objective to add Hudi support.

Also applies to: 13-13


30-30: Simplified cloud_aws_lib deps using shared_libs.

Good refactoring to improve maintenance.


33-41: Test dependencies updated for Hudi testing.

Added mockito and DynamoDBLocal for testing.


49-49: Test suite now uses shared_libs directly.

Eliminates the need for separate test_lib.

spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

194-217: Implementation looks good.

New Hudi Kryo registrator properly extends base class and registers essential Hudi classes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cloud_aws/BUILD.bazel (1)

1-20: Move test dependencies from shared_libs to test_deps.

JUnit dependencies at lines 10-11 should be in test_deps rather than shared_libs.

 shared_libs = [
     maven_artifact("software.amazon.awssdk:dynamodb"),
     maven_artifact("software.amazon.awssdk:regions"),
     maven_artifact("software.amazon.awssdk:aws-core"),
     maven_artifact("software.amazon.awssdk:sdk-core"),
     maven_artifact("software.amazon.awssdk:utils"),
     maven_artifact("com.google.guava:guava"),
     maven_artifact("org.slf4j:slf4j-api"),
     maven_artifact("org.apache.hudi:hudi-aws-bundle"),
-    maven_artifact("junit:junit"),
-    maven_artifact("com.novocode:junit-interface"),
     maven_artifact("com.fasterxml.jackson.module:jackson-module-afterburner"),
     scala_artifact_with_suffix("org.apache.hudi:hudi-spark3.5-bundle"),
     scala_artifact_with_suffix("org.scala-lang.modules:scala-collection-compat"),
     "//api:lib",
     "//api:thrift_java",
     "//online:lib",
     "//spark:lib",
     "//tools/build_rules/spark:spark-exec",
 ]

Then add to test_deps:

 test_deps = [
     maven_artifact("com.amazonaws:DynamoDBLocal"),
     maven_artifact("org.mockito:mockito-core"),
     maven_artifact("software.amazon.awssdk:auth"),
     maven_artifact("software.amazon.awssdk:identity-spi"),
     scala_artifact_with_suffix("com.chuusai:shapeless"),
     scala_artifact_with_suffix("org.typelevel:cats-core"),
     scala_artifact_with_suffix("org.mockito:mockito-scala"),
+    maven_artifact("junit:junit"),
+    maven_artifact("com.novocode:junit-interface"),
 ] + _CIRCE_DEPS + _SCALA_TEST_DEPS
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 75545f8 and af45447.

📒 Files selected for processing (1)
  • cloud_aws/BUILD.bazel (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: non_spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: bazel_config_tests
  • GitHub Check: analyzer_tests
🔇 Additional comments (3)
cloud_aws/BUILD.bazel (3)

30-30: Good refactoring!

Using the shared_libs variable improves maintainability.


33-41: Appropriate test dependencies added.

Mockito dependencies are suitable for the test environment.


49-49: Proper dependency management.

The updated dependency structure correctly uses both shared_libs and test_deps.

Co-authored-by: Thomas Chow <[email protected]>
@tchow-zlai tchow-zlai force-pushed the tchow/hudi-reads-writes branch from af45447 to f809912 Compare February 27, 2025 17:02
Co-authored-by: Thomas Chow <[email protected]>
@tchow-zlai tchow-zlai merged commit 773e6e7 into main Feb 27, 2025
20 checks passed
@tchow-zlai tchow-zlai deleted the tchow/hudi-reads-writes branch February 27, 2025 17:35
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jackson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.

- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.

- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jackson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.

- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.

- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jackson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.

- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.

- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jackson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.

- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.

- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jaour clientsson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.

- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.

- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants