-
Notifications
You must be signed in to change notification settings - Fork 48
perf: Join op discards child ordering in unordered mode #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4a6b85a
to
24aaaff
Compare
24aaaff
to
ac805ce
Compare
join=node.join, | ||
) | ||
else: | ||
left_unordered = self.compile_unordered_ir(node.left_child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if wee need to do anything to handle sort=True
behavior from the pandas join? I'm guessing "no", as I suspect this would be implemented as a sort after the join.
Also, let's add a comment explaining this optimization:
left_unordered = self.compile_unordered_ir(node.left_child) | |
# In general, joins are an ordering destroying operation. | |
# With ordering_mode = "partial", make this explicit. In | |
# this case, we don't need to provide a deterministic ordering. | |
left_unordered = self.compile_unordered_ir(node.left_child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, sort=True
sorts as an additional operation and so will still apply: https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/core/blocks.py#L2073-L2080
bigframes/core/compile/compiler.py
Outdated
if self.strict: | ||
compiled_ordered = [ | ||
self.compile_ordered_ir(node) for node in node.children | ||
] | ||
return concat_impl.concat_ordered(compiled_ordered) | ||
else: | ||
compiled_unordered = [ | ||
self.compile_unordered_ir(node) for node in node.children | ||
] | ||
return concat_impl.concat_unordered(compiled_unordered).as_ordered_ir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization worries me a little more than the join optimization. Could we update some documentation for concat
to call out this fact? I don't find this intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could be confusing if concat doesn't conserve ordering. Reverting this aspect of the change - only join will drop child ordering for now.
c0a5346
to
0b1ce61
Compare
…er mode
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕