You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test: fix concurrency issue in testThrottlingBlocking (#2001)
At the end of the test, it verifies that the batcher waited more than throttled time by capturing the argument of `callContext.withOption()`. There were two problems:
# "Zero interactions with this mock" problem
```
Error: Failures:
Error: BatcherImplTest.testThrottlingBlocking:899
apiCallContext.withOption(
<Capturing argument>,
<Capturing argument>
);
Wanted 1 time:
-> at com.google.api.gax.batching.BatcherImplTest.testThrottlingBlocking(BatcherImplTest.java:899)
But was 0 times.
```
The callContext.withOption method is called by `BatcherImpl.sendOutstanding()` method via
`BatcherImpl$PushCurrentBatchRunnable` in **another thread**. https://togithub.com/googleapis/sdk-platform-java/blob/4c741077d614093d08665e9ddd83fb0e332b7881/gax-java/gax/src/main/java/com/google/api/gax/batching/BatcherImpl.java#L206
Technically, there's no guarantee that the thread calls the withOption method within a certain timeframe, especially when we run the tests concurrently in a machine with many CPU cores (the nightly test setup). That's why we occasionally saw "Zero interactions with this mock" problem.
In my experiment (https://togithub.com/suztomo/gax-batcher-impl-test-reproducer/blob/main/README.md), increasing 100 ms timeout to 1000 ms just worked good to prevent false positives. The assertion is not about timing, but about BatcherImpl recording the throttled time correctly.
After fixing this problem, then I saw another problem below.
# "expected to be at least: 50 but was: 48" problem
```
[ERROR] com.google.api.gax.batching.BatcherImplTest.testThrottlingBlocking_gax_test06 -- Time elapsed: 5.138 s <<< FAILURE!
expected to be at least: 50
but was : 48
at com.google.api.gax.batching.BatcherImplTest.testThrottlingBlocking_gax_test06(BatcherImplTest.java:926)
```
At the end of the test, it verifies that the batcher waited more than throttled time by capturing the argument of `callContext.withOption()`. In ideal, usual scenarios,
1. In thread A, the batcher blocks at `batcher.add` (because of flowController) the thread and starts stopwatch to measure throttled time.
2. In thread B (created by the BatcherImplTest) sleeps 50 ms (`throttledTime`)
3. In thread B, BatcherImplTest calls `flowController.release()`, which wakes up the thread A.
4. The batcher records it was blocked more than 50 ms.
However, in rare cases, these events happens in this order:
1. Thread B starts sleeping 50 ms.
2. In thread A the batcher blocks at `batcher.add` (because of flowController) and starts stopwatch to measure throttled time.
3. In thread B, after 50 ms sleep, BatcherImplTest calls `flowController.release()`, which wakes up the thread A.
5. The batcher records it was blocked **less than** 50 ms. This makes the test assertion fail with "_expected to be at least: 50 but was: 48_" message.
To prevent this rare case, BatcherImplTest needs to wait before sleeping 50 ms in thread B. In this pull request, I add a do-while loop to check the thread A (the thread of `batcher.add`) is in Thread.State.WAITING state before making the sleeping of 50 ms.
- [x] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/gapic-generator-java/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
Fixes#1193 ☕️
0 commit comments