[wip] use native annotation tags #75

tchow-zlai · 2024-11-21T20:50:56Z

Summary

Switching to using native tag annotations, as this allows us the full flexibility of the scala test CLI grammar.

For example, now we can do tag exclusions:

sbt sbt "spark/testOnly -- -l ai.chronon.spark.test.FetcherTestTag"

note the -l.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Introduced new test tags: JoinTestTag, FetcherTestTag, and MutationsTestTag for improved test categorization.
- Updated test classes to utilize the new tagging mechanism, enhancing clarity and maintainability.
Bug Fixes
- Corrected comments in the FeatureWithLabelJoinTest class for accuracy.
Chores
- Removed the TaggedFilterSuite trait to streamline testing processes.

coderabbitai · 2024-11-21T20:51:04Z

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request modifies the GitHub Actions workflow for testing Scala and Python, focusing on specific test execution commands in several jobs. It introduces a new Java file defining three annotations for tagging tests, updates existing test classes to use these annotations instead of a previous tagging method, and corrects minor comments in the test files. Additionally, a trait used for conditional test execution has been removed.

Changes

File Path	Change Summary
.github/workflows/test_scala_and_python.yaml	Updated test execution commands in `join_spark_tests`, `mutation_spark_tests`, and `fetcher_spark_tests` jobs to use new test tags.
spark/src/test/java/ai/chronon/spark/test/Tags.java	Introduced `Tags.java` with annotations: `JoinTestTag`, `FetcherTestTag`, and `MutationsTestTag` for categorizing tests in ScalaTest.
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala	Corrected comment regarding label in `FeatureWithLabelJoinTest`.
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala	Added `@FetcherTestTag` annotation to `FetcherTest` class, replacing the `tagName` method, and updated comments for running tests.
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala	Added `@JoinTestTag` annotation to `JoinTest` class, replacing the `tagName` method, and included necessary imports.
spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala	Added `@MutationsTestTag` annotation to `MutationsTest` class, replacing the `tagName` method, and updated comments for running tests.
spark/src/test/scala/ai/chronon/spark/test/TaggedFilterSuite.scala	Removed the `TaggedFilterSuite` trait, which was used for conditional test execution based on a specified tag.

Possibly related PRs

Tweak spark test setup to tags and run tests appropriately #56: Modifies the GitHub Actions workflow for testing Scala and Python, focusing on Spark tests.
fix: Address flaky spark tests #72: Addresses flaky Spark tests related to FeatureWithLabelJoinTest, potentially involving similar testing mechanisms.

Suggested reviewers

piyush-zlai
nikhil-zlai
varant-zlai

Poem

🐰 In the meadow where tests do play,
New tags hop in, brightening the day.
Fetcher and Join, with mutations in tow,
Testing's a breeze, watch our coverage grow!
With a flick of a command, the errors we chase,
In the world of code, we find our place! 🌼

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

tchow-zlai · 2024-11-21T20:54:54Z

spark/src/test/java/ai/chronon/spark/test/Tags.java

+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.ElementType;
+
+@org.scalatest.TagAnnotation


@piyush-zlai wdyt about this? Reason for the change is that this more natively leverages the scalatest TagAnnotation framework and allows us to do exclusions on the commandline without additional code ie:

sbt "spark/testOnly -- -l ai.chronon.spark.test.JoinTestTag"

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

.github/workflows/test_scala_and_python.yaml (1)
86-87: Update step names to reflect specific test categories.

The step names in these jobs say "Run other spark tests" but they are actually running specific tagged tests (join, mutations, and fetcher tests respectively). Consider updating the names for clarity:
- name: Run other spark tests
+ name: Run join spark tests
- name: Run other spark tests
+ name: Run mutations spark tests
- name: Run other spark tests
+ name: Run fetcher spark tests
Also applies to: 105-106, 124-125
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (1)

Line range hint 141-144: Consider enhancing method documentation.

The test helper methods (generateMutationData, generateRandomData, generateEventOnlyData) would benefit from additional documentation explaining:

Input parameters and their impact

Expected test data patterns

Use cases for different test scenarios

Also applies to: 234-237, 467-470

spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (1)

43-43: Consider removing unused imports if flagged by static analysis.

The import statement includes Ignore which might not be used in this file.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fc7f408 and eb480b5.

📒 Files selected for processing (7)

.github/workflows/test_scala_and_python.yaml (3 hunks)
spark/src/test/java/ai/chronon/spark/test/Tags.java (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/TaggedFilterSuite.scala (0 hunks)

💤 Files with no reviewable changes (1)

spark/src/test/scala/ai/chronon/spark/test/TaggedFilterSuite.scala

✅ Files skipped from review due to trivial changes (2)

spark/src/test/java/ai/chronon/spark/test/Tags.java
spark/src/test/scala/ai/chronon/spark/test/FeatureWithLabelJoinTest.scala

🔇 Additional comments (8)

.github/workflows/test_scala_and_python.yaml (2)

87-87: LGTM! Consistent update to native annotation tags.

The changes correctly update the test tags to use fully qualified annotation class names, which aligns with the transition to native tag annotations. This change improves clarity and maintainability by explicitly referencing the annotation classes.

Also applies to: 106-106, 125-125

86-87: Verify resource allocation strategy.

There are notable differences in resource allocation between test jobs:

other_spark_tests uses ubuntu-8_cores-32_gb runner with 24G heap
These specific test jobs use ubuntu-latest with 8G heap

Let's verify if this allocation is intentional and optimal.

Also applies to: 105-106, 124-125

✅ Verification successful

Let me gather more information about the test characteristics and memory usage patterns.

Based on the gathered information, I can now provide a final response:

Memory allocation differences are justified by test characteristics

The different memory allocations between test jobs are appropriate because:

JoinTest primarily works with in-memory data transformations and Spark operations
The other Spark tests (using 24G heap) include more comprehensive test suites that likely process larger datasets and run multiple operations in parallel

The current allocation of 8G heap for JoinTest is sufficient as the test focuses on join operations with in-memory data structures and doesn't show evidence of heavy resource requirements in the implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check test execution times and memory usage patterns

# Search for test execution patterns and memory requirements
rg -A 5 "testOnly.*JoinTestTag|testOnly.*MutationsTestTag|testOnly.*FetcherTestTag" --glob '!.github'

# Look for documentation about resource requirements
fd -t f "README|CONTRIBUTING" -X cat {} 2>/dev/null | grep -i "memory\|resource\|requirement"

Length of output: 2190

Script:

#!/bin/bash
# Check the test files for memory-intensive operations and data generation
ast-grep --pattern 'class JoinTest {
  $$$
}'

# Look for other test configurations and memory settings
rg -l "SBT_OPTS.*-Xmx" --glob '!.github/workflows/test_scala_and_python.yaml'

# Check for any performance or resource-related comments in the test files
rg -i "memory|performance|resource|intensive" spark/src/test/scala/

Length of output: 13181

spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (2)

67-69: LGTM! Class annotation changes align with native tag transition.

The transition to using @FetcherTestTag annotation and removal of TaggedFilterSuite extension successfully implements the native tag annotation approach.

Line range hint 825-844: Add test coverage for tag-based filtering.

Given the transition to native tag annotations, consider adding test cases that specifically verify:

Tag-based test filtering functionality
Tag exclusion using the -l option
Multiple tag combinations

spark/src/test/scala/ai/chronon/spark/test/MutationsTest.scala (2)

49-49: LGTM: Test execution command updated correctly.

The comment has been updated to use the fully qualified annotation name, which aligns with the new native tag annotations approach.

52-53: LGTM: Test class properly annotated.

The @MutationsTestTag annotation has been correctly added to the test class, replacing the previous tagging mechanism.

Let's verify the annotation definition:

✅ Verification successful

Annotation is properly defined and used as expected

The verification confirms that:

The @MutationsTestTag is correctly defined as a Java annotation interface in spark/src/test/java/ai/chronon/spark/test/Tags.java
It's properly used in MutationsTest.scala with correct documentation for test execution
No unexpected or incorrect usages were found in the codebase

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the MutationsTestTag annotation definition and usage

# Check the annotation definition
ast-grep --pattern 'class MutationsTestTag {
  $$$
}'

# Check for any other usages of this tag
rg "MutationsTestTag" -A 2

Length of output: 764

spark/src/test/scala/ai/chronon/spark/test/JoinTest.scala (2)

49-49: Verify consistent annotation usage across test files.

Since this is a WIP PR transitioning to native tag annotations, ensure that:

All test files follow the same pattern of using native annotations
The running instructions are consistently documented

Let's check other test files for annotation usage:

✅ Verification successful

No consistency issues found with test annotations

Based on the search results, there are only a few test files using custom test tags (@JoinTestTag, @MutationsTestTag, @FetcherTestTag), while most test files use standard JUnit annotations (@Test, @Before, @After). The change to use @JoinTestTag in JoinTest.scala appears to be consistent with the existing pattern of using custom test tags where specific test categorization is needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check annotation usage consistency across test files
# Look for files with @*TestTag annotations
ast-grep --pattern '@$_TestTag'

# Look for files still using the old tagging method
rg 'override def tagName'

Length of output: 61

Script:

#!/bin/bash
# Let's try a broader search for test files and their annotations
# First, find all test files
fd "Test.scala$" --type f

# Then look for any annotations in these files
rg "^[[:space:]]*@\w+" --type scala

# Also check for any imports of test-related annotations
rg "import.*Test.*Tag" --type scala

Length of output: 42368

48-50: LGTM! Clear documentation and proper annotation usage.

The comment clearly documents how to run the test with the new tag syntax, and the @JoinTestTag annotation is correctly applied at the class level.

Let's verify the annotation class exists:

tchow-zlai added 2 commits November 21, 2024 12:51

[wip] shard unit tests based on native annotation tags

2baadf4

remove tagged filter suite

e86d571

tchow-zlai force-pushed the tchow/shard-tests branch from f53cb2c to e86d571 Compare November 21, 2024 20:51

tchow-zlai requested a review from piyush-zlai November 21, 2024 20:53

tchow-zlai commented Nov 21, 2024

View reviewed changes

comment

eb480b5

coderabbitai bot reviewed Nov 21, 2024

View reviewed changes

update

49041f2

tchow-zlai closed this Nov 21, 2024

tchow-zlai deleted the tchow/shard-tests branch November 21, 2024 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] use native annotation tags #75

[wip] use native annotation tags #75

Uh oh!

tchow-zlai commented Nov 21, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 21, 2024 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

tchow-zlai Nov 21, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

[wip] use native annotation tags #75

[wip] use native annotation tags #75

Uh oh!

Conversation

tchow-zlai commented Nov 21, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

tchow-zlai Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tchow-zlai commented Nov 21, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)