-
-
Notifications
You must be signed in to change notification settings - Fork 153
SSL error when cleaning up pods #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @jacobtomlinson if you have any thoughts. |
I forgot to mention that if I tail the log of the api-server I get this error whenever I run the script:
|
Hmm. Initial thoughts would be are the ca cert bundles installed correctly on the machine you are running the script from? |
Hi there, thanks for the quick reply! This is my Dockerfile
If I look in the /etc/ssl/certs I see all of the certificates, as you can see from the Dockerfile I also tried to add the cluster's CA to the bundle but I am still getting the same error. |
I've done some more troubleshooting: If I extract the certificates from by kubectl config and use the KubeAuth object then it works without a hitch (and the pods get cleaned up correctly):
if I use the kubectl config file then it throws the SSL error at the end and the pods do not get cleaned up
shouldn't it behave the same way? |
I think I might have found the issue: the _cleanup_pods function () does not setup the authentication the same way the _init function does dask-kubernetes/dask_kubernetes/core.py Lines 541 to 552 in 344fac7
it looks like it's missing a call to ClusterAuth.load_first() dask-kubernetes/dask_kubernetes/core.py Lines 194 to 196 in 344fac7
I tried to add in my environment a ClusterAuth.load_first() line before the kubernetes.client.CoreV1Api() and now it works but I am not sure it's the correct fix because to be coherent with the rest of the code it would need to be passed the "auth" variable as well. |
Thanks for debugging this. Yeah it looks like some of the auth flow is missing there. It might be more sensible to run Do you feel up to putting in a PR for this? |
Hi @jacobtomlinson I just submitted the pull request. A suggestion since you are going to rewrite this: I think it would be useful to have the pod deletion optional since for debugging purposes it would be useful to be able to look at the pod's output. |
You can already do this with |
@jacobtomlinson thanks, it worked like a charm, I guess I should have rtfm 😃 |
I'm hitting this error again when running with the following versions: dask-kubernetes==0.10.1 Applying a change similar to what @giordyb suggested on #172 makes the problem go away:
def _cleanup_resources(namespace, labels, core_api):
""" Remove all pods with these labels in this namespace """
pods = yield core_api.list_namespaced_pod(namespace, label_selector=format_labels(labels))
...
services = yield core_api.list_namespaced_service(
namespace, label_selector=format_labels(labels)
) I'll happily open a new Issue or a PR if requested. |
A PR would be great thanks! |
Hi,
I'm having a small issue running dask-kubernetes on a local Kubernetes 1.14.3 cluster (the one provided by the latest docker desktop: the job runs fine and I get back the correct results but it looks like there is an SSL issue when it tries to clean up the pods.
File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 544, in _cleanup_pods
This is the error that I get when I run my script job:
the code is running inside a container that is the same image as the ones specified in the worker-spec.yml
it seems related to #113 but in my case the container is running inside the k8s cluster and I can reach all of the workers correctly.
Thanks,
Giordano
The text was updated successfully, but these errors were encountered: