-
Notifications
You must be signed in to change notification settings - Fork 313
Massive downloads (1B+ rows) causes read errors #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wonder if removing the default pool size would be a sufficient fix?
Alternatively (preferably?) we could set
|
The other option here is something like a bag of tasks where we bound concurrent work via semaphore but still allow for a large number of streams. It requires more concurrency coordination which admittedly isn't python's strong suit, but it would prevent overwhelming the client. |
Maybe instead of trying to guess the max concurrent stream that a system can support, it's simpler to just let user overwrite the |
After troubleshooting for a long while on why my pandas read_gbq() (with bqstoarage_api enabled) query read request data throughput would drop like a rock early on in the download, I think I eventually found my answer.
Looking at the GCP API monitor, I saw that my requests would eventually error out in a 499 response message (client error).
After all my debugging, I found that this function was returning with 1000 read steams/threads to download.
https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py#L838
I believe that for massive query results and a (max_stream_count=requested_streams) value of 0, the BQ server returns with its max stream count of 1000 streams to use. This most likely overwhelms a system and causes some of the threads to die due to timeout connections or something like that. I found that when I forced the stream number to be much more reasonable, like 48 that my download worked fine.
Environment details
google-cloud-bigquery
version: 2.31.0The text was updated successfully, but these errors were encountered: