Skip to content

BigtableToAvro - Option to only retrieve latest version of a column #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simenl opened this issue Nov 10, 2020 · 2 comments
Closed

BigtableToAvro - Option to only retrieve latest version of a column #188

simenl opened this issue Nov 10, 2020 · 2 comments
Labels
addition New feature or request stale

Comments

@simenl
Copy link

simenl commented Nov 10, 2020

Scenario
Dataflow jobs are occasionally failing due to large bigtable rows (above 256MiB).
The cause of the large bigtable rows are older versions that has not been garbage collected, yet.

Possible solution
Add option to only retrieve the latest versions from a row in the BigtableToAvro template.

Error message

Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Error while reading table '...' : Read returned 347MiB from row '...' which exceeds the limit of 256MiB. Make sure you are setting an appropriate request filter to retrieve only recent versions and only the columns you want. If columns are accumulating more versions than you need to read, you can also create a garbage collection policy: https://cloud.google.com/bigtable/docs/configuring-garbage-collection#versions io.grpc.Status.asRuntimeException(Status.java:521)

Copy link

github-actions bot commented Jun 7, 2024

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 7, 2024
Copy link

This issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

asthamohta pushed a commit that referenced this issue Apr 23, 2025
* Wide Row IT for Forward Migration (#170)

* Added Missing Datastream Private Connectivity

* Review comments fixes (#176)

* Move to generic to Base Class

* Rename method

* Moved Spanner Check to base class

* Added Ignore For 5000 Tables

* FM code Refactored (#188)

* Added FM Low Priority WIde Row Fixws

* Added Code Refectored

* Code Refecoting Fixes

* removed 5K table test

fix compilation error

* revert changes

* Added IT PR Stuck Fixes and Ignore 100MB test as it is holding our Pipeline

* removed 100MB flaky test

---------

Co-authored-by: taherkl <[email protected]>
Co-authored-by: Akash Thawait <[email protected]>
ron-gal pushed a commit to ron-gal/DataflowTemplates that referenced this issue Apr 25, 2025
* Wide Row IT for Forward Migration (GoogleCloudPlatform#170)

* Added Missing Datastream Private Connectivity

* Review comments fixes (GoogleCloudPlatform#176)

* Move to generic to Base Class

* Rename method

* Moved Spanner Check to base class

* Added Ignore For 5000 Tables

* FM code Refactored (GoogleCloudPlatform#188)

* Added FM Low Priority WIde Row Fixws

* Added Code Refectored

* Code Refecoting Fixes

* removed 5K table test

fix compilation error

* revert changes

* Added IT PR Stuck Fixes and Ignore 100MB test as it is holding our Pipeline

* removed 100MB flaky test

---------

Co-authored-by: taherkl <[email protected]>
Co-authored-by: Akash Thawait <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants