-
-
Notifications
You must be signed in to change notification settings - Fork 153
Support for nprocs? #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this would be good. There's a difference between processes and Pods in terms of worker locality for data comms. |
I have no specific objection to having multiple processes within a pod. Ideally I would like things to be configurable with sensible defaults. One process per pod is a sensible default, but if you have a workload that would benefit from some additional tuning then sure go for it. The issue with @hhuuggoo do you have someone at Saturn who could contribute here and work on this? |
Yes! It will probably be me - though I might be a bit slow on this. @jacobtomlinson Do you think I should move this issue to |
I think there are two things here:
I think |
@jacobtomlinson I'm not sure #2 needs to be addressed. The scheduler already understands hosts: and Adaptive supports configuring that parameter I think we would only need to modify |
Actually I think there is one other thing that I can raise with distributed. I'm not sure how important this is yet, but probably if we want to always be using |
Just a note that I just started digging around, and I'm not sure this is an issue (was looking at 2021.07 earlier last week). I believe the recommendations I'm getting back for the scheduler are for whole pods, but I can confirm on this issue later on when I can dig deeper. I do think there is an issue where while pods are starting, dask_kubernetes does not know that they are starting. I had a situation where the scheduler wanted to scale down to 1, and it resulted in all pods being shut down, except for one that was still in the process of starting up. When I confirm that, I will write it up as a separate issue, and possibly close this one. |
The classic |
Is there was any interest in building in support for nprocs? I know in #84 the consensus was that having a 1-1 relationship between processes and pods makes the most sense.
We use nprocs because
I've considered thinking in pods rather than machines but for the clusters we manage, machines are the fundamental unit people pay for, and it's easy to end up in a situation where machines are under-utilized at the k8s level. Yes k8s can move pods around, but that ends up potentially disrupting longer running workloads.
For the most part using dask-kubernetes with
nprocs>1
has worked pretty well. It can get a little goofy because ifnprocs=4
and I callscale(4)
I end up with 16 workers. I think the most value would be accomplished in makingadaptive
understandnprocs
So the question is just if anyone else cares about this? If it's just me, I'll subclass
Adaptive
and call it a day. Otherwise I can add this functionality intodask-kubernetes
.The text was updated successfully, but these errors were encountered: