-
Notifications
You must be signed in to change notification settings - Fork 1k
BigtableToAvro - Option to only retrieve latest version of a column #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions. |
This issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
* Wide Row IT for Forward Migration (#170) * Added Missing Datastream Private Connectivity * Review comments fixes (#176) * Move to generic to Base Class * Rename method * Moved Spanner Check to base class * Added Ignore For 5000 Tables * FM code Refactored (#188) * Added FM Low Priority WIde Row Fixws * Added Code Refectored * Code Refecoting Fixes * removed 5K table test fix compilation error * revert changes * Added IT PR Stuck Fixes and Ignore 100MB test as it is holding our Pipeline * removed 100MB flaky test --------- Co-authored-by: taherkl <[email protected]> Co-authored-by: Akash Thawait <[email protected]>
* Wide Row IT for Forward Migration (GoogleCloudPlatform#170) * Added Missing Datastream Private Connectivity * Review comments fixes (GoogleCloudPlatform#176) * Move to generic to Base Class * Rename method * Moved Spanner Check to base class * Added Ignore For 5000 Tables * FM code Refactored (GoogleCloudPlatform#188) * Added FM Low Priority WIde Row Fixws * Added Code Refectored * Code Refecoting Fixes * removed 5K table test fix compilation error * revert changes * Added IT PR Stuck Fixes and Ignore 100MB test as it is holding our Pipeline * removed 100MB flaky test --------- Co-authored-by: taherkl <[email protected]> Co-authored-by: Akash Thawait <[email protected]>
Scenario
Dataflow jobs are occasionally failing due to large bigtable rows (above 256MiB).
The cause of the large bigtable rows are older versions that has not been garbage collected, yet.
Possible solution
Add option to only retrieve the latest versions from a row in the BigtableToAvro template.
Error message
Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Error while reading table '...' : Read returned 347MiB from row '...' which exceeds the limit of 256MiB. Make sure you are setting an appropriate request filter to retrieve only recent versions and only the columns you want. If columns are accumulating more versions than you need to read, you can also create a garbage collection policy: https://cloud.google.com/bigtable/docs/configuring-garbage-collection#versions io.grpc.Status.asRuntimeException(Status.java:521)
The text was updated successfully, but these errors were encountered: