Massive downloads (1B+ rows) causes read errors

After troubleshooting for a long while on why my pandas read_gbq() (with bqstoarage_api enabled) query read request data throughput would drop like a rock early on in the download, I think I eventually found my answer.

Looking at the GCP API monitor, I saw that my requests would eventually error out in a 499 response message (client error).

After all my debugging, I found that this function was returning with 1000 read steams/threads to download.
https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py#L838

I believe that for massive query results and a (max_stream_count=requested_streams) value of 0, the BQ server returns with its max stream count of 1000 streams to use.  This most likely overwhelms a system and causes some of the threads to die due to timeout connections or something like that.  I found that when I forced the stream number to be much more reasonable, like 48 that my download worked fine.

#### Environment details

  - OS type and version: Linux 64-bit
  - Python version: 3.8
  - `google-cloud-bigquery` version: 2.31.0



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Massive downloads (1B+ rows) causes read errors #1252

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Massive downloads (1B+ rows) causes read errors #1252

Description

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions