Skip to content

Change the value of pause-grace for new scheduler #5221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 26, 2022

Conversation

JesseStutler
Copy link
Contributor

The default value of pause-grace, which is 50 milliseconds by default, is too short for FPCScheduler. If requests' time intervals are longer than 50ms, the container will be paused and can't pull requests from MemoryQueue, doesn't take advantage of the ActivationProxy-ActivationService path, which is the core of FPCScheduler design. So we need to set the pause-grace to a longer value, 10s by default now.

Description

To prove this, I did a test for FPCScheduler. I sent 5 concurrent requests every 5 seconds, the second 5 requests would be successfully picked up by GetActivaion in MemoryQueue. But for subsequent concurrent requests, the SchedulingDecisionMaker would keep matching an exceptional case:

  // this is an exceptional case, create a container immediately
  case (Running, _) if totalContainers == 0 && availableMsg > 0 =>
    logging.info(
      this,
      s"add one container if totalContainers($totalContainers) == 0 && availableMsg($availableMsg) > 0 [$invocationNamespace:$action]")
    Future.successful(DecisionResults(AddContainer, 1))

It was because MemoryQueue told him that there were no warm containers, but actually, there were some containers already spawned for previous requests, just in paused state, and couldn't pull requests temporarily. And then SchedulingDecisonMaker tried to send container creation message to the invoker, invoker didn't create any container because there are already some warm containers, causing some latencies.

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • [N/A] I added tests to cover my changes.
  • [N/A] My changes require further changes to the documentation.
  • I updated the documentation where necessary.

Copy link
Member

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @JesseStutler for the contribution!
Looks good to me with a minor nit.

@@ -154,7 +154,7 @@ whisk {
# The "unusedTimeout" in the ContainerProxy,
#aka 'How long should a container sit idle until we kill it?'
idle-container = 10 minutes
pause-grace = 50 milliseconds
pause-grace = 10 seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect the old scheduler too.
It looks fine to me as I think each downstream would use its own timeout.
But worth listening to others' opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally pro larger pause grace anyways as the default, but maybe needs to be communicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect the old scheduler too. It looks fine to me as I think each downstream would use its own timeout. But worth listening to others' opinions.

Yes I realize later that I have changed application.conf as well, supposed to only change ansible/README.md

@bdoyle0182
Copy link
Contributor

LGTM but if the container can't pull new activations once it's paused from the scheduler so it appears like there's no available warm containers, what's the point of keeping the container once it is paused with the default pause now being 10 seconds and the default container idle eviction being 10 minutes for the new scheduler. Or am I missing something and there's another mechanism that unpauses the container from the scheduler?

@JesseStutler
Copy link
Contributor Author

Actually, I take @style95 's opinion that in their internal system they set 10 seconds as default, but for k8s environment, ow does not pause containers. I can get the point that, in old scheduler, the warm container can be paused for saving resources. For subsequent requests, the invoker can resume the container. But for FPCScheduler, it seems that there is no need to pause the container? Otherwise we can't take advantage of ActivationClientProxy-ActivaionService path if the time interval is longer than any preset pause-grace. But on the other hand, warm containers will waste resources if they keep running and don't handle any requests.

Or under paused state containers also need to send get activation requests? What are your suggestions?

@style95
Copy link
Member

style95 commented Apr 22, 2022

@bdoyle0182

Let me share the details.
In FPCScheduling, each ContainerProxy keeps sending FetchRequest to schedulers to pull activations.
When there is no activation for pause-grace, the container would be paused in turn it can't send the FetchRequest.
Warmed containers are still counted in the scheduler but they cannot send the request accordingly no activation is pulled.
After then, once a new activation comes to an action queue, it would try to add a new container because there is no running container.
And at this point, it will try to reuse warmed(paused) containers first.
If there are warmed containers in invokers, they are chosen first.
Since the data plane and control plane are separated in FPCScheduler, frequent pausing can aggravate the performance.
It's same in the old scheduler too but in FPCScheduler there is one more step for containers to send FetchRequest so latency could be a little bit more harmed.

And I feel it's a kind of tradeoff for the VM environment. (In K8S, no container is paused.)
No matter which value we choose, if the incoming interval for activations is a little bigger than that, the action will suffer from this. (This paper addressed this well: Serverless in the wild, maybe we can consider taking a similar approach)

Anyway, one thing we can see is a bigger value will prevent frequent actions to be suffered.
Also, let's say unpausing takes around 200 ~ 500ms, if the interval is 1s, it's 20 ~ 50% overhead of the interval.
But if the interval is 1 min or 1 hour, 200~500ms overhead might be acceptable.

@bdoyle0182
Copy link
Contributor

bdoyle0182 commented Apr 22, 2022

I agree pause grace should be higher if it's a high percentage of overhead. I asked about this recently in slack and it sounded like the only reason it was low was for the billing cycles being 100ms so public users would not get free compute from async processes after their function completes, but as a default I don't see that as necessary

And understand totally now it works the same as the old scheduler where the scheduler makes a request to the invoker to create container which the invoker proxy will get and unpause the existing container, just the one additional step of the unpaused container then having to retrieve the activation from the scheduler

I guess my follow up is is there any reason for pause to be less than the container eviction time, or in other words pause at all? Is pausing a strain on resources. To me it seems like constantly pausing and unpausing a container is a bigger strain on resources than just not doing it all if you don't care about billing and what you're users are doing once the function completes as long as the standard resource limits set for the container are still there to protect the vm i.e. the max memory of the container, max file limit, etc.

I don't think the pause grace needs to be turned off as a default, but maybe we could advertise that if you don't care about user billing and there aren't concerns about performance to the vm and other functions; then maybe the most optimal thing would be to set that config greater than the container idle eviction timeout unless I'm missing something?

@bdoyle0182
Copy link
Contributor

Important thing to note about the pause grace value is that the container eviction time is pauseGrace + idleTimeout at least in the old scheduler so increasing the pause grace value dramatically will also increase the time that containers sit around unless that behavior is decoupled

@style95
Copy link
Member

style95 commented Apr 23, 2022

Important thing to note about the pause grace value is that the container eviction time is pauseGrace + idleTimeout at least in the old scheduler so increasing the pause grace value dramatically will also increase the time that containers sit around unless that behavior is decoupled

Exactly. The value I see in this PR is to improve the development experience by increasing the default pause-grace.
I think the default value is mostly used when people give it a shot with OW in their local.
They would test each feature, and try to run some actions by hand.
So even if they frequently invoke actions containers would be easily paused in turn the waitTime would increase.

Since each downstream can freely change the default while deploying OW in production(and I believe they are already doing it), I think we can increase the default.
Too small pause-grace is not well aligned with one of the main ideas behind the FPCScheduler and even it could harm the latency in the old scheduler too.

10s looks good to me for the old scheduler too but I just wanted to know if there is any history that I am not aware of.

@JesseStutler
Copy link
Contributor Author

I agree that there is no a priori default value for pause-grace. Maybe some developers will concern about waitTime, so they need a bigger pause-grace, even not to pause, so that unpause or pause introduced latency overhead will be reduced. Maybe some developers will concern about unpausing containers will compete for resources and influence other containers' behaviors or waste resources for VMs, so they may need to pause it. But for now, I think pause-grace is at odds with FPCScheduler's design...

BTW, as @bdoyle0182 's saying, pause-grace is low is most for billing reasons. Can't ow charge only in activation handling cycles? No requests no charge?

@JesseStutler
Copy link
Contributor Author

JesseStutler commented Apr 23, 2022

Since each downstream can freely change the default while deploying OW in production(and I believe they are already doing it), I think we can increase the default.

I think I need hot configuration swapping so much...Each time I change the resources like application.conf I need to rebuild an image and redeploy it. Don't know if ow can achieve like K8S's configmap.

@codecov-commenter
Copy link

codecov-commenter commented May 6, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.18%. Comparing base (1fbfd6b) to head (44ffb1f).
Report is 118 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5221      +/-   ##
==========================================
- Coverage   79.97%   75.18%   -4.79%     
==========================================
  Files         238      238              
  Lines       14154    14154              
  Branches      607      607              
==========================================
- Hits        11319    10642     -677     
- Misses       2835     3512     +677     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@style95
Copy link
Member

style95 commented May 9, 2022

@JesseStutler Could you rebase and resolve the conflicts?

I think I need hot configuration swapping so much...Each time I change the resources like application.conf I need to rebuild an image and redeploy it. Don't know if ow can achieve like K8S's configmap.

This is possible but I see there is no corresponding configuration available for pauseGrace.

@style95
Copy link
Member

style95 commented May 10, 2022

I hope the default pause-grace is increased.
Each downstream still can configure its own value for it.

IMHO, keeping warmed containers alive is just for performance improvement and if there is no activation running, users should not be charged.

Maybe one thing we can do is to make it configurable without rebuilding the code.

@JesseStutler JesseStutler force-pushed the master branch 3 times, most recently from d95cb7a to a490cd3 Compare May 12, 2022 02:58
@JesseStutler
Copy link
Contributor Author

Hi @style95, I have already resolved the conflict.

@JesseStutler JesseStutler requested a review from style95 May 12, 2022 10:53
@style95
Copy link
Member

style95 commented May 19, 2022

It's strange that CI tests are failing because of the swaggerGen task failure.
I retriggered the build.

@JesseStutler
Copy link
Contributor Author

Oh Failed again? What was the reason? I don't change any code but only the configuration files.

@style95
Copy link
Member

style95 commented Jun 15, 2022

It requires another rebase to include d88a7d2

@style95
Copy link
Member

style95 commented Jul 9, 2022

@JesseStutler Could you rebase this to pass all checks?

@JesseStutler
Copy link
Contributor Author

sorry, long time not to check, I will try to solve it.

@JesseStutler JesseStutler force-pushed the master branch 2 times, most recently from 992b0d0 to 68fd751 Compare July 13, 2022 13:45
@JesseStutler
Copy link
Contributor Author

JesseStutler commented Jul 14, 2022

@style95 Hi, the check has passed, please review.

#aka 'How long should a container sit idle until we kill it?'
idle-container = 10 minutes
pause-grace = 10 seconds
keeping-duration = 1 second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update this to 10 minutes just like the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@JesseStutler
Copy link
Contributor Author

Done. It seems to be merged quickly, I found that the master has been modified a lot recently

Copy link
Member

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@bdoyle0182 Do you have any comments on this?

@bdoyle0182
Copy link
Contributor

fine with me

@style95 style95 merged commit f915acc into apache:master Jul 26, 2022
@style95
Copy link
Member

style95 commented Jul 26, 2022

@JesseStutler
Thanks for your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants