-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix race between executor insert and advanceTime #1554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The previous implementation modifies a HashMap while iterating through its key set, causeing unpredictable iteration behavior. This might be the reason our tests intermittently deadlock. The new implementation uses a PriorityQueue. The time complexity is O(M * log N), where M is the number of expirations before the cut over time and N is the total number of expirations. I am not sure how this compares to O(N) intended in the previous implementation. If required, O(N) is also possible using an ArrayList. Unfortunately, a new failure has emerged. Instead of deadlocking, testModifyAckDeadline intermittently fails. Maybe - I have fixed the old bug and created a new one, - I have fixed the old bug that was masking another one, - The deadlock wasn't caused by the iteration. Now the tests just fail before they could deadlock, or some combination thereof. The incorrect iteration should be fixed regardless.
This is a follow up to #1552 . The linked PR has a race condition: If the ExtensionJobs is added to the queue after the alarm is set up, it is possible that the alarm would fire before the jobs are added. Fixing this is easy, just set up the alarm after. However, doing this consistently deadlocks the tests. Why? The tests uses a fake executor. In tests, time does not flow naturally, but is forced to increment, usually by the executor's `advanceTime` method. There is a race between the test thread advancing the time and the mock server thread inserting more tasks to the fake executor. If the tasks get inserted first, all is well. Otherwise, `advanceTime` doesn't see the tasks, and they do not get executed. The fix is to check the "current time" every time a task is inserted. If the task is inserted "in the past", we run the task immediately. Doing this still deadlocks the tests. Why? The fake executor needs to lock the task queue when registering a task. If the task is inserted in the past, it also needs to run the task. Running the task might involve sending a requst to the mock server. A GRPC thread on the mock server might handle the request by adding more tasks to the executor. The executor's queue is locked by the first thread, resulting in a deadlock. The fix is to lock the queue just long enough to retrieve a task, then execute the task without the lock.
Changes Unknown when pulling 021b5ad on pongad:tick into ** on GoogleCloudPlatform:pubsub-hp**. |
davidtorres
approved these changes
Jan 23, 2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow up to #1552, which should be merged first.
cc @davidtorres