-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix race in mock publisher #1560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FakePublisherServiceImpl::publish has a race. If the call to publish happen before a response can be placed, the server will try to respond with a null object. The fix is to either - always set the response before calling - make publish wait for the response For propriety, this commit does both. This fix reveals another flake. Publisher uses exponential backoff with jitter. The jitter randomly picks a number between 0 and a maximum. If we pick low values too many times, it will retry too often and the server will run out of canned transient errors to respond back with. The test still passed since it expected any Throwable. This commit fixed the test to expect FakeException, set the jitter to random in range (max/2, max), and increases the number of canned errors to compensate. Retrying can still causes random test failures, independently of above changes. If a request fails due to DEADLINE_EXCEEDED, the future is completed with a corresponding error. However, the last RPC might not have been successfully cancelled. When a new test starts, it gives canned response to the server. The server might use some of these responses to respond to RPCs of previous tests. Consequently, a misbehaving test can fail every test that comes after it. This commit changes the test setup code so that it creates a new fake server for every test to avoid this problem.
cc @davidtorres , we still seem to have flakes in |
Couldn't you also mock the jitter, or the random number provider? |
@garrettjonesgoogle Good idea. PTAL |
LGTM, after you update the PR description to match the latest logic. |
Description fixed in PR and commit |
FakePublisherServiceImpl::publish has a race.
If the call to publish happen before a response can be placed,
the server will try to respond with a null object.
The fix is to either
For propriety, this commit does both.
This fix reveals another flake.
Publisher uses exponential backoff with jitter.
The jitter randomly picks a number between 0 and a maximum.
If we pick low values too many times,
it will retry too often and the server will run out of canned
transient errors to respond back with.
The test still passed since it expected any Throwable.
This commit fixed the test to expect FakeException,
and use a fake "random" number generator to ensure
we don't run out of return values.
Retrying can still causes random test failures,
independently of above changes.
If a request fails due to DEADLINE_EXCEEDED,
the future is completed with a corresponding error.
However, the last RPC might not have been successfully cancelled.
When a new test starts, it gives canned response to the server.
The server might use some of these responses to respond to
RPCs of previous tests.
Consequently, a misbehaving test can fail every test that comes
after it.
This commit changes the test setup code so that it
creates a new fake server for every test to avoid this problem.