Skip to content

Commit 1dd668e

Browse files
Tabriziank8s-ci-robot
authored andcommitted
Add troubleshooting guide for notebooks (#1008)
* feat: add template of troubleshooting guide for notebooks * feat: improve the notebooks troubleshooting guide * fix: improve the note for GCP users * fix: minor bug in the link * fix: improve the troubleshooting guide * fix: fix troubleshooting guides
1 parent 3bc674e commit 1dd668e

File tree

3 files changed

+99
-31
lines changed

3 files changed

+99
-31
lines changed

content/docs/notebooks/setup.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -265,4 +265,6 @@ exposed to the internet and is an unsecured endpoint by default.
265265
* Learn the advanced features available from a Kubeflow notebook, such as
266266
[submitting Kubernetes resources](/docs/notebooks/submit-kubernetes/) or
267267
[building Docker images](/docs/notebooks/submit-docker-image/).
268-
268+
* Visit the [troubleshooting guide](/docs/notebooks/troubleshoot) for fixing common
269+
errors in creating Jupyter notebooks in Kubeflow
270+
+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
+++
2+
title = "Troubleshooting Guide"
3+
description = "Fixing common problems in Kubeflow notebooks"
4+
weight = 50
5+
+++
6+
7+
## Persistent Volumes and Persistent Volumes Claims
8+
9+
First, make sure that PVCs are bounded when using Jupter notebooks. This should
10+
not be a problem when using managed Kuberenetes. But if you are using Kubernetes
11+
on-prem, check out the guide to [Kubeflow on-prem in a multi-node Kubernetes cluster](/docs/use-cases/kubeflow-on-multinode-cluster/) if you are running Kubeflow in multi-node on-prem environment. Otherwise, look at the [Pods stuck in Pending State](/docs/other-guides/troubleshooting/#pods-stuck-in-pending-state) guide to troubleshoot this problem.
12+
13+
## Check the status of notebooks
14+
15+
Run the commands below.
16+
17+
```
18+
kubectl get notebooks -o yaml ${NOTEBOOK}
19+
kubectl describe notebooks ${NOTEBOOK}
20+
```
21+
22+
Check the `events` section to make sure that there are no errors.
23+
24+
## Check the status of statefulsets
25+
26+
Make sure that the number of `statefulsets` equals the desired number. If it is
27+
not the case, check for errors using the `kubectl describe`.
28+
29+
30+
```
31+
kubectl get statefulsets -o yaml ${NOTEBOOK}
32+
kubectl describe statefulsets ${NOTEBOOK}
33+
```
34+
35+
36+
The output should look like below:
37+
```
38+
NAME DESIRED CURRENT AGE
39+
your-notebook 1 1 9m4s
40+
```
41+
## Check the status of Pods
42+
43+
If the number of statefulsets didn't match the desired number, make sure that
44+
the number of Pods match the number of desired Pods in the first command.
45+
In case it didn't match, follow the steps below to further investigate the issue.
46+
47+
```
48+
kubectl get pod -o yaml ${NOTEBOOK}-0
49+
```
50+
51+
* The name of the Pod should start with `jupter`
52+
* If you are using username/password auth with Jupyter the pod will be named
53+
54+
```
55+
jupyter-${USERNAME}
56+
```
57+
58+
* If you are using IAP on GKE the pod will be named
59+
60+
```
61+
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
62+
```
63+
* Where [email protected] is the Google account used with IAP
64+
65+
Once you know the name of the pod do
66+
67+
```
68+
kubectl describe pod ${NOTEBOOK}-0
69+
```
70+
71+
* Look at the `events` to see if there are any errors trying to schedule the pod
72+
* One common error is not being able to schedule the pod because there aren’t enough resources in the cluster.
73+
74+
75+
If the error still persisted, check for the errors in the logs of containers.
76+
77+
```
78+
kubectl logs ${NOTEBOOK}-0
79+
```
80+
81+
## Note for GCP Users
82+
83+
You may encounter error below:
84+
```
85+
Type Reason Age From Message
86+
---- ------ ---- ---- -------
87+
Warning FailedCreate 2m19s (x26 over 7m39s) statefulset-controller create Pod test1-0 in StatefulSet test1 failed error: pods "test1-0" is forbidden: error looking up service account kubeflow/default-editor: serviceaccount "default-editor" not found
88+
```
89+
90+
To fix this problem, create a service account named `default-editor` with cluster-admin role.
91+
92+
```
93+
kubectl create sa default-editor
94+
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user default-editor
95+
```

content/docs/other-guides/troubleshooting.md

+1-30
Original file line numberDiff line numberDiff line change
@@ -78,37 +78,8 @@ how RBAC interacts with IAM on GCP.
7878

7979
## Problems spawning Jupyter pods
8080

81-
If you're having trouble spawning Jupyter notebooks, check that the pod is getting
82-
scheduled
81+
This section has been moved to [Jupyter Notebooks Troubleshooting Guide] (/docs/notebooks/troubleshoot/).
8382

84-
```
85-
kubectl -n ${NAMESPACE} get pods
86-
```
87-
88-
* Look for pods whose name starts with juypter
89-
* If you are using username/password auth with Jupyter the pod will be named
90-
91-
```
92-
jupyter-${USERNAME}
93-
```
94-
95-
* If you are using IAP on GKE the pod will be named
96-
97-
```
98-
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
99-
```
100-
101-
* Where [email protected] is the Google account used with IAP
102-
103-
Once you know the name of the pod do
104-
105-
```
106-
kubectl -n ${NAMESPACE} describe pods ${PODNAME}
107-
```
108-
109-
* Look at the events to see if there are any errors trying to schedule the pod
110-
* One common error is not being able to schedule the pod because there aren't
111-
enough resources in the cluster.
11283

11384
## Pods stuck in Pending state
11485

0 commit comments

Comments
 (0)