Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12260] Java - Backport Firestore connector's ramp-up throttling to Datastore connector #14713

Merged

Conversation

danthev
Copy link
Contributor

@danthev danthev commented May 3, 2021

A new connector for Cloud Firestore is being added to Beam and currently out for review (see #14261). Some of the features and adjustments are useful for Cloud Datastore as well, so they're being backported to this connector.

Ramp-up throttling

This follows the same implementation as the Firetore connector, though implemented as a subtransform in the Write/Delete PTransform for simplicity and so the change is more easily visible.
In essence, the ramp-up throttler is a generic rate limiter that emits elements only when there is sufficient "budget". This budget starts at 500 entities per second (split across workers) and increases by 50% every 5 minutes, as documented under Datastore Best Practices. Abiding by this rule is not always necessary, the throttling step can be turned off in that case.

Minor tweaks

There have been minor tweaks and adjustments to some of the constants along the way.

  • Raised target latency for automatic batch sizing from 5 to 6 seconds.
  • Decreased initial batch size from 200 to 50.
  • Decresed minimum batch size from 10 to 5.
  • The default user agent from pipeline options is set, so the Beam version used is visible by the Datastore team for debugging.
  • Add two metrics to help users debug, datastoreEntitiesMutated and datastoreLatencyMsPerMutation.

R: @chamikaramj


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@danthev danthev changed the title [BEAM-12260] Backport Firestore connector's ramp-up throttling to Datastore connector [BEAM-12260] Java - Backport Firestore connector's ramp-up throttling to Datastore connector May 4, 2021
@aaltay aaltay requested a review from chamikaramj May 13, 2021 22:58
@aaltay
Copy link
Member

aaltay commented May 13, 2021

@chamikaramj - feel free to delegate to someone else.

@danthev - thank you!

@chamikaramj
Copy link
Contributor

Run Java PreCommit

@chamikaramj
Copy link
Contributor

Same questions as #14723 but we can do the conversation there.

@chamikaramj
Copy link
Contributor

R: @nehsyc

@chamikaramj chamikaramj requested a review from udim June 7, 2021 05:28
@chamikaramj
Copy link
Contributor

R: @udim

(also please note the corresponding Python changes and the current review from @nehsyc in #14723)

@danthev
Copy link
Contributor Author

danthev commented Jun 11, 2021

Run Java PreCommit

@danthev
Copy link
Contributor Author

danthev commented Jun 12, 2021

There seems to be a broken Flink test in master, otherwise this is ready for review.

@danthev
Copy link
Contributor Author

danthev commented Jun 15, 2021

Tests are green, so change is ready for review.

Copy link
Contributor

@nehsyc nehsyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel! Generally looks good :)

Copy link
Contributor

@nehsyc nehsyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)


/** Tests for {@link RampupThrottlingFn}. */
@RunWith(JUnit4.class)
public class RampupThrottlingFnTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to test throttling counter is set properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (Had to make throttlingMsecs visible for testing)

@danthev
Copy link
Contributor Author

danthev commented Jun 17, 2021

Run Apache Beam Java SDK Quickstart - Direct

@danthev
Copy link
Contributor Author

danthev commented Jun 17, 2021

retest this please

@danthev
Copy link
Contributor Author

danthev commented Jun 17, 2021

Java Wordcount failure seems unrelated, but I can't get it to rerun the test.

@danthev
Copy link
Contributor Author

danthev commented Jun 18, 2021

Run Java PreCommit

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM other than a nit.

@@ -212,7 +220,7 @@
* <p>Testing has found that a batch of 200 entities will generally finish within the timeout even
* in adverse conditions.
*/
@VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_START = 200;
@VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_START = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this updated ? If 50 is a better number could you also update the comment please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment. We found that a lower starting value helps with entities that have an extremely high number of indexes, these would sometimes start with load that had no realistic chance of succeeding. This is just a start value, so the size quickly corrects up to 500 if write throughput is enough after all.

@@ -225,7 +233,7 @@
* number of entities per request may be lower when we flush for the end of a bundle or if we hit
* {@link DatastoreV1.DATASTORE_BATCH_UPDATE_BYTES_LIMIT}.
*/
@VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_MIN = 10;
@VisibleForTesting static final int DATASTORE_BATCH_UPDATE_ENTITIES_MIN = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Note that this could potentially double the number of RPCs to Datastore.

Copy link
Contributor Author

@danthev danthev Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat similar issue, the limit of 10 was rather arbitrary and led to some deadlines on expensive writes. Write cost outweighs the cost of processing more RPCs, so we went down a little to accommodate that.
Both variables help with edge cases on complex entities, most traffic will be able to go to batch size 500.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications.

@danthev
Copy link
Contributor Author

danthev commented Jun 18, 2021

Thanks! Rerunning precommit, there's a FlinkSavePoint test that I've noticed more than once now, probably flaky.

Run Java PreCommit

@danthev
Copy link
Contributor Author

danthev commented Jun 18, 2021

Run Java PreCommit

@chamikaramj chamikaramj merged commit fcd3fcc into apache:master Jun 21, 2021
@tvalentyn
Copy link
Contributor

It looks like the test added in this change is flaky: https://issues.apache.org/jira/browse/BEAM-12858. @danthev would you by chance be able to take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants