Closed
Description
Currently, dask's operator 2023.8.1
is using Kubernetes replicas to scale up / down the workers as seen in
dask-kubernetes/dask_kubernetes/operator/kubecluster/kubecluster.py
Lines 745 to 748 in 7c09b57
This results in cases such as #659, Since Kubernetes doesn't know the state or data stored in workers it would kill those workers in an attempt to scale up/down as requested by the operator resulting in instability issues or partial data loss if it interrupted data moving operation.
Scaling up wouldn't cause much trouble as it's just adding new workers, however, problems occur during the scaling down
Metadata
Metadata
Assignees
Labels
No labels