Skip to content

[Scheduler Enhancement] Consider binding action when creating or recovering queue. #5267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 12, 2022

Conversation

style95
Copy link
Member

@style95 style95 commented Jul 4, 2022

Description

Currently, binding action is not properly considered while creating or recovering queues.
It makes throttling for package actions work in an original namespace and actions from different namespaces share the same throttling limit.
This change will address the issue.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2022

Codecov Report

Merging #5267 (da47d71) into master (8843579) will decrease coverage by 4.81%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master    #5267      +/-   ##
==========================================
- Coverage   80.09%   75.27%   -4.82%     
==========================================
  Files         238      238              
  Lines       14080    14087       +7     
  Branches      576      569       -7     
==========================================
- Hits        11277    10604     -673     
- Misses       2803     3483     +680     
Impacted Files Coverage Δ
.../openwhisk/core/loadBalancer/FPCPoolBalancer.scala 33.20% <0.00%> (ø)
.../openwhisk/core/scheduler/queue/QueueManager.scala 83.06% <85.71%> (+0.05%) ⬆️
...apache/openwhisk/core/service/WatcherService.scala 91.83% <100.00%> (+0.17%) ⬆️
...ontainerpool/v2/FunctionPullingContainerPool.scala 82.87% <100.00%> (+0.21%) ⬆️
...ntainerpool/v2/FunctionPullingContainerProxy.scala 78.12% <100.00%> (ø)
...k/core/containerpool/v2/InvokerHealthManager.scala 75.00% <100.00%> (ø)
...core/database/cosmosdb/RxObservableImplicits.scala 0.00% <0.00%> (-100.00%) ⬇️
...ore/database/cosmosdb/cache/CacheInvalidator.scala 0.00% <0.00%> (-100.00%) ⬇️
...e/database/cosmosdb/cache/ChangeFeedConsumer.scala 0.00% <0.00%> (-100.00%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0.00% <0.00%> (-95.85%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8843579...da47d71. Read the comment docs.

@ningyougang
Copy link
Contributor

LGTM

@@ -397,7 +400,7 @@ class QueueManager(
logging.warn(
this,
s"[${msg.activationId}] the activation message has not been scheduled for ${queueManagerConfig.maxSchedulingTime.toSeconds} sec")
completeErrorActivation(msg, "The activation has not been processed")
completeErrorActivation(msg, "The activation has not been processed: too old activation is arrived.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
completeErrorActivation(msg, "The activation has not been processed: too old activation is arrived.")
completeErrorActivation(msg, "The activation has not been processed due to timeout waiting for processing in the scheduler.")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this effectively describes the case.
This is the case that activations arrived after the max scheduling wait time.
For example, this can happen when there was a failure in Kafka so activations could not be delivered and just stored in it. When Kafka becomes available again, it will start delivering activations.
But if it took so much time to restore Kafka such as 1 hour, it will send too old(1 hour-old) activations.
Also, if there were many activations stored in Kafka before the failure, it would cause a thundering herd by sending them all at the same time. So we complete them with an error.

@style95 style95 force-pushed the consider-binding branch from 23e7d57 to da47d71 Compare July 8, 2022 01:41
@style95 style95 merged commit c66486e into apache:master Jul 12, 2022
JesseStutler pushed a commit to JesseStutler/openwhisk that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants