Change the value of pause-grace for new scheduler #5221

JesseStutler · 2022-04-22T04:09:09Z

The default value of pause-grace, which is 50 milliseconds by default, is too short for FPCScheduler. If requests' time intervals are longer than 50ms, the container will be paused and can't pull requests from MemoryQueue, doesn't take advantage of the ActivationProxy-ActivationService path, which is the core of FPCScheduler design. So we need to set the pause-grace to a longer value, 10s by default now.

Description

To prove this, I did a test for FPCScheduler. I sent 5 concurrent requests every 5 seconds, the second 5 requests would be successfully picked up by GetActivaion in MemoryQueue. But for subsequent concurrent requests, the SchedulingDecisionMaker would keep matching an exceptional case:

  // this is an exceptional case, create a container immediately
  case (Running, _) if totalContainers == 0 && availableMsg > 0 =>
    logging.info(
      this,
      s"add one container if totalContainers($totalContainers) == 0 && availableMsg($availableMsg) > 0 [$invocationNamespace:$action]")
    Future.successful(DecisionResults(AddContainer, 1))

It was because MemoryQueue told him that there were no warm containers, but actually, there were some containers already spawned for previous requests, just in paused state, and couldn't pull requests temporarily. And then SchedulingDecisonMaker tried to send container creation message to the invoker, invoker didn't create any container because there are already some warm containers, causing some latencies.

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
[N/A] I added tests to cover my changes.
[N/A] My changes require further changes to the documentation.
I updated the documentation where necessary.

style95

Thanks, @JesseStutler for the contribution!
Looks good to me with a minor nit.

style95 · 2022-04-22T05:56:44Z

core/invoker/src/main/resources/application.conf

@@ -154,7 +154,7 @@ whisk {
      # The "unusedTimeout" in the ContainerProxy,
      #aka 'How long should a container sit idle until we kill it?'
      idle-container = 10 minutes
-      pause-grace = 50 milliseconds
+      pause-grace = 10 seconds


This will affect the old scheduler too.
It looks fine to me as I think each downstream would use its own timeout.
But worth listening to others' opinions.

I'm personally pro larger pause grace anyways as the default, but maybe needs to be communicated

This will affect the old scheduler too. It looks fine to me as I think each downstream would use its own timeout. But worth listening to others' opinions.

Yes I realize later that I have changed application.conf as well, supposed to only change ansible/README.md

bdoyle0182 · 2022-04-22T06:02:53Z

LGTM but if the container can't pull new activations once it's paused from the scheduler so it appears like there's no available warm containers, what's the point of keeping the container once it is paused with the default pause now being 10 seconds and the default container idle eviction being 10 minutes for the new scheduler. Or am I missing something and there's another mechanism that unpauses the container from the scheduler?

JesseStutler · 2022-04-22T06:57:02Z

Actually, I take @style95 's opinion that in their internal system they set 10 seconds as default, but for k8s environment, ow does not pause containers. I can get the point that, in old scheduler, the warm container can be paused for saving resources. For subsequent requests, the invoker can resume the container. But for FPCScheduler, it seems that there is no need to pause the container? Otherwise we can't take advantage of ActivationClientProxy-ActivaionService path if the time interval is longer than any preset pause-grace. But on the other hand, warm containers will waste resources if they keep running and don't handle any requests.

Or under paused state containers also need to send get activation requests? What are your suggestions?

style95 · 2022-04-22T06:58:13Z

@bdoyle0182

Let me share the details.
In FPCScheduling, each ContainerProxy keeps sending FetchRequest to schedulers to pull activations.
When there is no activation for pause-grace, the container would be paused in turn it can't send the FetchRequest.
Warmed containers are still counted in the scheduler but they cannot send the request accordingly no activation is pulled.
After then, once a new activation comes to an action queue, it would try to add a new container because there is no running container.
And at this point, it will try to reuse warmed(paused) containers first.
If there are warmed containers in invokers, they are chosen first.
Since the data plane and control plane are separated in FPCScheduler, frequent pausing can aggravate the performance.
It's same in the old scheduler too but in FPCScheduler there is one more step for containers to send FetchRequest so latency could be a little bit more harmed.

And I feel it's a kind of tradeoff for the VM environment. (In K8S, no container is paused.)
No matter which value we choose, if the incoming interval for activations is a little bigger than that, the action will suffer from this. (This paper addressed this well: Serverless in the wild, maybe we can consider taking a similar approach)

Anyway, one thing we can see is a bigger value will prevent frequent actions to be suffered.
Also, let's say unpausing takes around 200 ~ 500ms, if the interval is 1s, it's 20 ~ 50% overhead of the interval.
But if the interval is 1 min or 1 hour, 200~500ms overhead might be acceptable.

bdoyle0182 · 2022-04-22T18:10:54Z

I agree pause grace should be higher if it's a high percentage of overhead. I asked about this recently in slack and it sounded like the only reason it was low was for the billing cycles being 100ms so public users would not get free compute from async processes after their function completes, but as a default I don't see that as necessary

And understand totally now it works the same as the old scheduler where the scheduler makes a request to the invoker to create container which the invoker proxy will get and unpause the existing container, just the one additional step of the unpaused container then having to retrieve the activation from the scheduler

I guess my follow up is is there any reason for pause to be less than the container eviction time, or in other words pause at all? Is pausing a strain on resources. To me it seems like constantly pausing and unpausing a container is a bigger strain on resources than just not doing it all if you don't care about billing and what you're users are doing once the function completes as long as the standard resource limits set for the container are still there to protect the vm i.e. the max memory of the container, max file limit, etc.

I don't think the pause grace needs to be turned off as a default, but maybe we could advertise that if you don't care about user billing and there aren't concerns about performance to the vm and other functions; then maybe the most optimal thing would be to set that config greater than the container idle eviction timeout unless I'm missing something?

bdoyle0182 · 2022-04-22T21:00:46Z

Important thing to note about the pause grace value is that the container eviction time is pauseGrace + idleTimeout at least in the old scheduler so increasing the pause grace value dramatically will also increase the time that containers sit around unless that behavior is decoupled

style95 · 2022-04-23T01:33:13Z

Important thing to note about the pause grace value is that the container eviction time is pauseGrace + idleTimeout at least in the old scheduler so increasing the pause grace value dramatically will also increase the time that containers sit around unless that behavior is decoupled

Exactly. The value I see in this PR is to improve the development experience by increasing the default pause-grace.
I think the default value is mostly used when people give it a shot with OW in their local.
They would test each feature, and try to run some actions by hand.
So even if they frequently invoke actions containers would be easily paused in turn the waitTime would increase.

Since each downstream can freely change the default while deploying OW in production(and I believe they are already doing it), I think we can increase the default.
Too small pause-grace is not well aligned with one of the main ideas behind the FPCScheduler and even it could harm the latency in the old scheduler too.

10s looks good to me for the old scheduler too but I just wanted to know if there is any history that I am not aware of.

JesseStutler · 2022-04-23T04:35:30Z

I agree that there is no a priori default value for pause-grace. Maybe some developers will concern about waitTime, so they need a bigger pause-grace, even not to pause, so that unpause or pause introduced latency overhead will be reduced. Maybe some developers will concern about unpausing containers will compete for resources and influence other containers' behaviors or waste resources for VMs, so they may need to pause it. But for now, I think pause-grace is at odds with FPCScheduler's design...

BTW, as @bdoyle0182 's saying, pause-grace is low is most for billing reasons. Can't ow charge only in activation handling cycles? No requests no charge?

JesseStutler · 2022-04-23T04:39:06Z

Since each downstream can freely change the default while deploying OW in production(and I believe they are already doing it), I think we can increase the default.

I think I need hot configuration swapping so much...Each time I change the resources like application.conf I need to rebuild an image and redeploy it. Don't know if ow can achieve like K8S's configmap.

codecov-commenter · 2022-05-06T16:18:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.18%. Comparing base (1fbfd6b) to head (44ffb1f).
Report is 118 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5221      +/-   ##
==========================================
- Coverage   79.97%   75.18%   -4.79%     
==========================================
  Files         238      238              
  Lines       14154    14154              
  Branches      607      607              
==========================================
- Hits        11319    10642     -677     
- Misses       2835     3512     +677

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

style95 · 2022-05-09T01:35:54Z

@JesseStutler Could you rebase and resolve the conflicts?

I think I need hot configuration swapping so much...Each time I change the resources like application.conf I need to rebuild an image and redeploy it. Don't know if ow can achieve like K8S's configmap.

This is possible but I see there is no corresponding configuration available for pauseGrace.

style95 · 2022-05-10T08:13:57Z

I hope the default pause-grace is increased.
Each downstream still can configure its own value for it.

IMHO, keeping warmed containers alive is just for performance improvement and if there is no activation running, users should not be charged.

Maybe one thing we can do is to make it configurable without rebuilding the code.

JesseStutler · 2022-05-12T03:02:08Z

Hi @style95, I have already resolved the conflict.

style95 · 2022-05-19T01:29:03Z

It's strange that CI tests are failing because of the swaggerGen task failure.
I retriggered the build.

JesseStutler · 2022-05-19T02:47:44Z

Oh Failed again? What was the reason? I don't change any code but only the configuration files.

style95 · 2022-06-15T11:59:11Z

It requires another rebase to include d88a7d2

style95 · 2022-07-09T10:23:56Z

@JesseStutler Could you rebase this to pass all checks?

JesseStutler · 2022-07-13T10:08:42Z

sorry, long time not to check, I will try to solve it.

JesseStutler · 2022-07-14T02:06:45Z

@style95 Hi, the check has passed, please review.

style95 · 2022-07-14T05:03:09Z

ansible/README.md

+      #aka 'How long should a container sit idle until we kill it?'
+      idle-container = 10 minutes
+      pause-grace = 10 seconds
+      keeping-duration = 1 second


Could you update this to 10 minutes just like the default value?

JesseStutler · 2022-07-17T04:59:43Z

Done. It seems to be merged quickly, I found that the master has been modified a lot recently

style95

LGTM.
@bdoyle0182 Do you have any comments on this?

bdoyle0182 · 2022-07-18T21:08:48Z

fine with me

style95 · 2022-07-26T05:12:55Z

@JesseStutler
Thanks for your effort.

style95 reviewed Apr 22, 2022

View reviewed changes

JesseStutler force-pushed the master branch 3 times, most recently from d95cb7a to a490cd3 Compare May 12, 2022 02:58

JesseStutler requested a review from style95 May 12, 2022 10:53

JesseStutler force-pushed the master branch 2 times, most recently from 992b0d0 to 68fd751 Compare July 13, 2022 13:45

style95 reviewed Jul 14, 2022

View reviewed changes

JesseStutler force-pushed the master branch from 68fd751 to 1708a20 Compare July 17, 2022 04:42

Change the value of pause-grace for new scheduler

44ffb1f

JesseStutler force-pushed the master branch from 1708a20 to 44ffb1f Compare July 17, 2022 04:57

style95 approved these changes Jul 18, 2022

View reviewed changes

style95 merged commit f915acc into apache:master Jul 26, 2022

Change the value of pause-grace for new scheduler #5221

Change the value of pause-grace for new scheduler #5221

Uh oh!

Conversation

JesseStutler commented Apr 22, 2022

Description

My changes affect the following components

Types of changes

Checklist:

Uh oh!

style95 left a comment

Choose a reason for hiding this comment

Uh oh!

style95 Apr 22, 2022

Choose a reason for hiding this comment

Uh oh!

bdoyle0182 Apr 22, 2022

Choose a reason for hiding this comment

Uh oh!

JesseStutler Apr 22, 2022

Choose a reason for hiding this comment

Uh oh!

bdoyle0182 commented Apr 22, 2022

Uh oh!

JesseStutler commented Apr 22, 2022

Uh oh!

style95 commented Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdoyle0182 commented Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdoyle0182 commented Apr 22, 2022

Uh oh!

style95 commented Apr 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JesseStutler commented Apr 23, 2022

Uh oh!

JesseStutler commented Apr 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

style95 commented May 9, 2022

Uh oh!

style95 commented May 10, 2022

Uh oh!

JesseStutler commented May 12, 2022

Uh oh!

style95 commented May 19, 2022

Uh oh!

JesseStutler commented May 19, 2022

Uh oh!

style95 commented Jun 15, 2022

Uh oh!

style95 commented Jul 9, 2022

Uh oh!

JesseStutler commented Jul 13, 2022

Uh oh!

JesseStutler commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

style95 Jul 14, 2022

Choose a reason for hiding this comment

Uh oh!

JesseStutler Jul 15, 2022

Choose a reason for hiding this comment

Uh oh!

JesseStutler commented Jul 17, 2022

Uh oh!

style95 left a comment

Choose a reason for hiding this comment

Uh oh!

bdoyle0182 commented Jul 18, 2022

Uh oh!

style95 commented Jul 26, 2022

Uh oh!

style95 commented Apr 22, 2022 •

edited

Loading

bdoyle0182 commented Apr 22, 2022 •

edited

Loading

style95 commented Apr 23, 2022 •

edited

Loading

JesseStutler commented Apr 23, 2022 •

edited

Loading

codecov-commenter commented May 6, 2022 •

edited

Loading

JesseStutler commented Jul 14, 2022 •

edited

Loading