-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use the first page a results when query(api_method="QUERY")
#1723
Conversation
…_is_completely_cached
I'm not sure I understand. Could you explain why doing this will improve query performance, and only applies when |
Great question @Linchin. There are two APIs for issuing a query in BigQuery: jobs.insert and jobs.query. There are some key differences between these APIs.
This PR does a few things: (1) if the job has finished, don't make the unnecessary call to |
…_is_completely_cached
# This also requires updates to `to_dataframe` and the DB API connector | ||
# so that they don't try to read from a destination table if all the | ||
# results are present. | ||
query_job._query_results = google.cloud.bigquery.query._QueryResults( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change mean we also load the query results when job is complete, when query(api_method="INSERT")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I just double-checked that this method is only called from query_jobs_query
, so it won't affect when api_method="INSERT"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for helping me understand this PR :)
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
This is a restoration of part of #374
TODO:
jobs.getQueryResults
orjobs.get
calls happen when iterating over rows that are fully returned fromjobs.query
.to_pandas
andto_arrow
work with this code path.Closes #589 🦕