-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Prevent cycle in the QueueManager #5332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
private def handleCycle(msg: ActivationMessage)(implicit transid: TransactionId): Unit = { | ||
val action = msg.action | ||
QueuePool.keys.find(_.docInfo.id == action.toDocId) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it tries to recover a queue iff there is no reference in QueuePool
.
Previously, it always sent a RecoverQueue
message whenever the sender of an activation is itself even if there was already a new queue added to the queue pool.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5332 +/- ##
==========================================
- Coverage 81.69% 76.69% -5.01%
==========================================
Files 240 240
Lines 14264 14310 +46
Branches 598 591 -7
==========================================
- Hits 11653 10975 -678
- Misses 2611 3335 +724 ☔ View full report in Codecov by Sentry. |
LGTM |
I found a regression in this change. |
What is the regression? |
1. Disable the scheduler
2. Shutting down the queue manager
3. Shutting down the queue
4. Queue endpoints are removed
5. A queue is in the removing stateIt's waiting for buffered activations to be handled.
5. New activation comes and it recovers the queueIn the meantime, new activation comes and it tries to recover a queue.
6. All incoming activations trigger the recovery procedure
7. The queue is recreatedIt is also added to the init revision map and other recovery attempts are ignored.
8. Shutdown the queue againThis newly created queue even received an activation but it receives
This is because the scheduler is in the private def recreateQueue(action: FullyQualifiedEntityName,
msg: ActivationMessage,
actionMetaData: WhiskActionMetaData): Unit = {
logging.warn(this, s"recreate queue for ${msg.action}")(msg.transid)
val queue = createAndStartQueue(msg.user.namespace.name.asString, action, msg.revision, actionMetaData)
queue ! msg
msg.transid.mark(this, LoggingMarkers.SCHEDULER_QUEUE_RECOVER)
if (isShuttingDown) {
queue ! GracefulShutdown. // here
}
} 9. While the queue is going to stop another queue is created againAnd it also shut down the queue again.
10. Shutdown the feedThis problem was repeatedly happening until the feed is also shut down and no more activation was coming.
11. Stop the scheduler when there are still activations in the queueSince no more activation was coming, all created queues (for the same action) were in the removing state and no queue reference was in the We can see non-zero
When the scheduler was stopped, there were around 10 different queues for the same actions and 33 activations were in them.
|
So to summarize the issue, the problem is that it's possible with this change for a scheduler to be stopped with activations still stuck in them even after being disabled? Are you sure that's introduced just with this change? |
Previously the queue was only created by controllers. When a scheduler was shut down or a queue was removed for some reason controllers were in charge of creating them again. |
I feel it's better to merge this PR and work based on it as the cycle anyway has to be addressed. |
Description
This is basically based on #5251
I just added a small change to improve the performance.
Without performance optimization.

With performance optimization.

Related issue and scope
My changes affect the following components
Types of changes
Checklist: