Skip to content

perf: Compilation no longer bounded by recursion #1464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 7, 2025

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 6, 2025
@TrevorBergeron TrevorBergeron marked this pull request as ready for review March 6, 2025 04:46
@TrevorBergeron TrevorBergeron requested review from a team as code owners March 6, 2025 04:46
Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the performance run on the presubmit? If so, do we have any data for performance gain from this PR?

for node in list(self.iter_nodes_topo()):
# child nodes have already been transformed
child_results = tuple(results[child] for child in node.child_nodes)
result = reduction(node, child_results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bottom_up and reduce_up are similar but the transformed methods APIs are different. Do we have a way to merge them into one? If not, could you please add docstring to reduce_up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, bottom_up could be implemented in terms of reduce_up, but will leave separate for now, maybe they'll have somewhat different optimizations or extra arguments later on. Added a brief description


class SQLCompiler:
def __init__(self, strict: bool = True):
self._compiler = compiler.Compiler(strict=strict)
# Not used, should maybe remove, if strictness is no longer compile-time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not used, any reasons to keep it still?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


@_compile_node.register
def compile_readtable(node: nodes.ReadTableNode, *args):
return compile_read_table_unordered(node.source, node.scan_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: compile_read_table_unordered is only used within this compile_readtable function. Would it improve readability to inline its logic here, or is there a reason for keeping it separate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was an ordered version at some point, inlined now

ibis_table[scan_item.source_id].name(scan_item.id.sql)
for scan_item in scan.items
),
scalar_op_compiler = compile_scalar.ScalarOpCompiler()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used in this file. Could we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return compile_read_table_unordered(node.source, node.scan_list)


def read_table_as_unordered_ibis(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: _read_table_as_unordered_ibis since it's internal used only. Can we move all these helper methods in the bottom of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function removed per other comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're different functions. Could you double check?

@TrevorBergeron
Copy link
Contributor Author

Does the performance run on the presubmit? If so, do we have any data for performance gain from this PR?

Main gains are when you would otherwise hit stack overflow. Though, still a few other cases where you can hit stack overflow, so might not be that helpful by itself?

Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one nit comment. Overall LGTM

return compile_read_table_unordered(node.source, node.scan_list)


def read_table_as_unordered_ibis(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're different functions. Could you double check?

@TrevorBergeron TrevorBergeron merged commit 27ab028 into main Mar 7, 2025
23 checks passed
@TrevorBergeron TrevorBergeron deleted the regrettable_recursion_recalled branch March 7, 2025 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants