Skip to content

fix: propagate kvstore multi get exception properly #955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 2, 2025

Conversation

hzding621
Copy link
Collaborator

Summary

Fix KVStore.getString such that the cases where KVStore.multiGet returns a Failure can be properly handled.

Why / Goal

Fetcher should handle join parts failures and return partial results. When KVStore throws exceptions for a certain GroupBy, the exception should be caught and swallowed as an exception string the responseMap instead of failing the whole request.

Test Plan

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested

Checklist

  • Documentation update

Reviewers

@airbnb/airbnb-chronon-maintainers @pengyu-hou @yuli-han

@hzding621 hzding621 force-pushed the haozhen--mussel-partial-failure branch from 9cca538 to 4f69380 Compare March 28, 2025 23:11
@yuli-han
Copy link
Collaborator

yuli-han commented Apr 1, 2025

When KVStore throws exceptions for a certain GroupBy, the exception should be caught and swallowed as an exception string the responseMap instead of failing the whole request.

I remembered this is already the current behavior right? Did we see any exception case?

@hzding621
Copy link
Collaborator Author

@yuli-han this is the expected behavior, but some scenario is not handled correctly previously and is addressed in the PR

Copy link
Collaborator

@pengyu-hou pengyu-hou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Haozhen for fixing this corner case!

@@ -670,7 +670,7 @@ class FetcherBase(kvStore: KVStore,
if (debug || Math.random() < 0.001) {
logger.error(s"Failed to fetch $groupByRequest", ex)
}
Map(groupByRequest.name + "_exception" -> ex.traceString)
Map(prefix + "_exception" -> ex.traceString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does prefix have the group by's request name?

Copy link
Collaborator Author

@hzding621 hzding621 Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, prefix here actually stores joinPart.fullPrefix. It's better to use joinPart.fullPrefix than groupByRequest.name because fullPrefix also contains joinPart.prefix.

There are edge cases where two join_part may share the same underlying GroupBy, but different prefix in join part, e.g. prefix = host vs guest, but both user-level GroupBy

Failure(buildException(e))
case Success(resp) =>
if (resp.values.isFailure) {
Failure(buildException(resp.values.failed.get))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so the failure was nested within the response value😑

Good catch.

@hzding621 hzding621 merged commit b45d1c2 into main Apr 2, 2025
7 checks passed
@hzding621 hzding621 deleted the haozhen--mussel-partial-failure branch April 2, 2025 02:21
@hzding621 hzding621 mentioned this pull request Apr 9, 2025
4 tasks
hzding621 pushed a commit that referenced this pull request Apr 10, 2025
hzding621 added a commit that referenced this pull request Apr 10, 2025
hzding621 pushed a commit that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants