Skip to content

[source-google-sheets] increase row_batch_size or make it as an input parameter #35274

Open
@tautvydas-v

Description

@tautvydas-v

Connector Name

source-google-sheets

Connector Version

0.3.16

What step the error happened?

During the sync

Relevant information

Currently, source-google-sheets has a default value of 200 for "row_batch_size" variable. We've noticed that we can easily increase this value in order to process more data with one request. Google Sheets API has a limit of 300 requests per project or 60 requests per user per project, and the only limitation for a request is that it has to be processed in under 180 seconds. Otherwise, if a google sheet has a lot of rows, there is a possibility that at some point exponential backoff fails and the whole sync silently fails too.

We've tested out with having this value as 100, 10000 and 150000 and it seems that this connector works the same way, but processes a lot more data. Also it's understandable that maybe someone would like to have this value lower / higher, so suggestion is to have this value as a parameter, which could be set before setting up connector. I'm happy to contribute to this new feature for source-google-sheets.

Relevant log output

2024-01-16 09:48:59 source > Fetching range Source/Medium!30079:30349
2024-01-16 09:48:59 source > Fetching range Source/Medium!30350:30620
2024-01-16 09:48:59 source > Fetching range Source/Medium!30621:30891
2024-01-16 09:49:00 source > Fetching range Source/Medium!30892:31162
2024-01-16 09:49:00 source > Backing off get_values(...) for 0.5s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:00 source > Increasing number of records fetching due to rate limits. Current value: 280
2024-01-16 09:49:00 source > Fetching range Source/Medium!30892:31172
2024-01-16 09:49:00 source > Backing off get_values(...) for 1.7s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:00 source > Increasing number of records fetching due to rate limits. Current value: 290
2024-01-16 09:49:02 source > Fetching range Source/Medium!30892:31182
2024-01-16 09:49:02 source > Backing off get_values(...) for 3.7s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:02 source > Increasing number of records fetching due to rate limits. Current value: 300
2024-01-16 09:49:06 source > Fetching range Source/Medium!30892:31192
2024-01-16 09:49:06 source > Backing off get_values(...) for 0.9s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:06 source > Increasing number of records fetching due to rate limits. Current value: 310
2024-01-16 09:49:07 source > Fetching range Source/Medium!30892:31202
2024-01-16 09:49:07 source > Backing off get_values(...) for 10.8s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:07 source > Increasing number of records fetching due to rate limits. Current value: 320
2024-01-16 09:49:18 source > Fetching range Source/Medium!30892:31212
2024-01-16 09:49:18 source > Backing off get_values(...) for 8.7s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:18 source > Increasing number of records fetching due to rate limits. Current value: 330
2024-01-16 09:49:27 source > Fetching range Source/Medium!30892:31222
2024-01-16 09:49:27 source > Backing off get_values(...) for 45.5s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:49:27 source > Increasing number of records fetching due to rate limits. Current value: 340
2024-01-16 09:49:32 destination > 2024-01-16 09:49:32 INFO i.a.c.i.d.b.BufferManager(printQueueInfo):118 - [ASYNC QUEUE INFO] Global: max: 4.69 GB, allocated: 20 MB (20.0 MB), % used: 0.004163890739672343 | Queue `Sheet1`, num records: 30884, num bytes: 8.62 MB, allocated bytes: 10 MB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.000000
2024-01-16 09:49:32 destination > 2024-01-16 09:49:32 INFO i.a.c.i.d.FlushWorkers(printWorkerInfo):146 - [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
2024-01-16 09:50:13 source > Fetching range Source/Medium!30892:31232
2024-01-16 09:50:13 source > Backing off get_values(...) for 47.2s (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:50:13 source > Increasing number of records fetching due to rate limits. Current value: 350
2024-01-16 09:50:32 destination > 2024-01-16 09:50:32 INFO i.a.c.i.d.b.BufferManager(printQueueInfo):118 - [ASYNC QUEUE INFO] Global: max: 4.69 GB, allocated: 20 MB (20.0 MB), % used: 0.004163890739672343 | Queue `Sheet1`, num records: 30884, num bytes: 8.62 MB, allocated bytes: 10 MB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.000000
2024-01-16 09:50:32 destination > 2024-01-16 09:50:32 INFO i.a.c.i.d.FlushWorkers(printWorkerInfo):146 - [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
2024-01-16 09:51:00 source > Fetching range Source/Medium!30892:31242
2024-01-16 09:51:00 source > Giving up get_values(...) after 9 tries (googleapiclient.errors.HttpError: <httperror>)
2024-01-16 09:51:00 source > Stopped syncing process due to rate limits. Rate limit has been reached. Please try later or request a higher quota for your account.

Contribute

  • Yes, I want to contribute

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions