Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: set allow_large_results=False by default #1541

Merged
merged 5 commits into from
Mar 27, 2025

Conversation

shobsi
Copy link
Contributor

@shobsi shobsi commented Mar 25, 2025

This is to optimize small result operations (which are more common). If the result can exceed 10MB, set bigframes.pandas.options.bigquery.allow_large_results=True explicitly.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 394658588 🦕

@shobsi shobsi requested a review from tswast March 25, 2025 22:24
@shobsi shobsi requested review from a team as code owners March 25, 2025 22:24
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 25, 2025
@shobsi shobsi changed the title feat!: set allow_large_results=False by default to optimize small r… feat!: set allow_large_results=False by default Mar 25, 2025
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Mar 26, 2025
@tswast
Copy link
Collaborator

tswast commented Mar 26, 2025

Looks like we need to update the benchmark script. Some metrics aren't available yet with allow_large_results=False. googleapis/python-bigquery#1996 has been closed, but I don't think we've updated bigframes to take advantage of it yet.

nox > python scripts/run_and_publish_benchmark.py --notebook --publish-benchmarks=notebooks/
Traceback (most recent call last):
  File "/tmpfs/src/github/python-bigquery-dataframes/scripts/run_and_publish_benchmark.py", line 486, in <module>
    main()
  File "/tmpfs/src/github/python-bigquery-dataframes/scripts/run_and_publish_benchmark.py", line 447, in main
    benchmark_metrics, error_message = collect_benchmark_result(
  File "/tmpfs/src/github/python-bigquery-dataframes/scripts/run_and_publish_benchmark.py", line 102, in collect_benchmark_result
    raise ValueError(
ValueError: Mismatch in the number of report files for bytes, millis, seconds and query char count.

@shobsi
Copy link
Contributor Author

shobsi commented Mar 26, 2025

Waiting for #1545 to go in first to fix the benchmarking script failure.

@shobsi shobsi requested a review from Genesis929 March 26, 2025 17:45
@shobsi shobsi merged commit e9fb712 into main Mar 27, 2025
24 checks passed
@shobsi shobsi deleted the shobs-allow_large_results-False branch March 27, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants