Skip to content

Reuse Dedup Shuffle #937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

smcnamara2-stripe
Copy link
Contributor

Summary

TODO

Why / Goal

TODO

Test Plan

  • Added Unit Tests
  • Running in prod at Stripe

Comment on lines +139 to +140
plannedJoin.queryExecution.logical match {
case ExtractEquiJoinKeys(_, _, rightKeys, _, _, _, _) =>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is planning the join necessary? couldn't tell if you just needed rightKeys , and if you just needed that, is pulling it from ExtractEquiJoinKeys via logical plan the only way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rightKeys is only knowable after planning the join; spark will insert additional normalization functions to ensure correctness:
Screenshot 2025-03-12 at 10 38 51 AM

The only way I've found to correctly detect and re-use the knownfloatingpointnormalized(normalizenanandzero(...)) functions is to plan a join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants