Skip to content
This repository was archived by the owner on Dec 9, 2022. It is now read-only.

Flask App Error #73

Closed
hamelsmu opened this issue May 28, 2020 · 21 comments
Closed

Flask App Error #73

hamelsmu opened this issue May 28, 2020 · 21 comments

Comments

@hamelsmu
Copy link
Contributor

You can see the logs on this URL

ValueError: argument: pem_path must be a valid filename. /var/secrets/github/kf-label-bot-prod.private-key.pem was not found.

I am not really sure what is happening I might need some help to debug or how to debug this in Kubernetes, its abstracted enough where I'm not sure what the best route to debug this is

Thanks, @jlewi

image

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

Looks like a bug in the secret.

kubectl  get secrets  -o yaml github-app

apiVersion: v1
data:
  fissue-label-bot-github-app.private-key.pem: <redacted>
kind: Secret
metadata:
  creationTimestamp: "2020-01-17T23:31:31Z"
  name: github-app
  namespace: label-bot-prod
  resourceVersion: "117250504"
  selfLink: /api/v1/namespaces/label-bot-prod/secrets/github-app
  uid: 7da96977-3981-11ea-b0b1-42010a8e0085
type: Opaque

Notice the extra f in the name.

@hamelsmu
Copy link
Contributor Author

huh wow ok should I get rid of this and re-apply the secret?

kubectl delete secret -f ...
and
kubectl apply -f ..

?

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

Just fixed I did kubectl get -o yaml ... edited it and then replied.

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

I forced a pod restart in order to pick up the updated secret

kubectl delete pods -l app:label-bot

Not ideal but effective

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

Looks like the environment variable is also wrong.

It using "kf-label-bot-prod.private-key.pem " but the secret is "issue-label-bot-github-app.private-key.pem"

I think the kf-label-bot is specific to KF.

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

I'm not sure what the correct value should be. I guess the questions to answer would be

  1. Which GitHub APP should we be using
  2. Where do we get the PEM key for it?

#57 is a little unclear

  • I think what I did was create a new GitHub owned by Kubeflow intended just for debugging and primarily for the issue label infra specific to Kubeflow
  • So I think the intent for prod and the shared (mlbot.net) front end was to use the GItHub app owned by machine-learning-apps
  • So the environment variable should probably be issue-label-bot-github-app.private-key.pem

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

@hamelsmu I think you want to change this line

value: /var/secrets/github/kf-label-bot-dev.private-key.pem

To have the correct filename; i.e. the filename should match the key in the K8s secret.

@hamelsmu
Copy link
Contributor Author

Is this something you feel comfortable in changing? how do I get the right k8s secret again ? Sorry for the noob questions

@hamelsmu
Copy link
Contributor Author

yeah I am completely lost, could use some help

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

@hamelsmu

kubectl  get secrets  -o yaml github-app

Data is a map of filenames to base64 encoded file contents. So Kubernetes will create a file with that contents in whatever location is specified in the config map.

Here

mountPath: /var/secrets/github

We mount the secret on /var/secrets/github.

The secret "github-app" has key "issue-label-bot-github-app.private-key.pem" so the path will be

/var/secrets/github-app/issue-label-bot-github-app.private-key.pem

So we need to update the environment variable in the manifest

value: /var/secrets/github/kf-label-bot-dev.private-key.pem

To use that path.

@hamelsmu
Copy link
Contributor Author

hamelsmu commented May 28, 2020

@jlewi how do you know

The secret "github-app" has key "issue-label-bot-github-app.private-key.pem" so the path will be

How can I find this information? Just trying to learn.

I also updated Issue-Label-Bot/deployment/base/deployment.yaml with the new value as you suggested and did
kubectl apply -f deployment/base/deployment.yaml but error still there. Any thoughts?

@jlewi
Copy link
Collaborator

jlewi commented May 28, 2020

@hamelsmu

If you look at the Deployment the secret is specified here

So we know the secret is github-app. So now we can inspect the secret.

kubectl  get secrets  -o yaml github-app

apiVersion: v1
data:
  fissue-label-bot-github-app.private-key.pem: <redacted>
  issue-label-bot-github-app.private-key.pem:  <redacted>
kind: Secret
metadata:
  name: github-app
  namespace: label-bot-prod
  resourceVersion: "213262813"
  selfLink: /api/v1/namespaces/label-bot-prod/secrets/github-app
  uid: 7da96977-3981-11ea-b0b1-42010a8e0085
type: Opaque

So there are two files issue-label-bot-github-app.private-key.pem and fissue-label-bot-github-app.private-key.pem the latter shouldn't be there its there because when I manually fixed it earlier I did a merge rather than a replace.

@hamelsmu
Copy link
Contributor Author

hmm I wonder why error is still showing up, it is almost like I need to refresh or re-deploy something but I am not sure what. I tried to redeploy but keep seeing the same error.

It looks like containers are stuck in ContainerCreating status

kubectl -n label-bot-prod get pods                                                                                  ✔
NAME                                      READY   STATUS              RESTARTS   AGE
issue-embedding-server-747f88788c-d95jd   1/1     Running             0          149d
issue-embedding-server-747f88788c-gdfrm   1/1     Running             0          112d
issue-embedding-server-747f88788c-prxqx   1/1     Running             0          149d
label-bot-worker-6cf79d54b9-b5qlx         1/1     Running             0          7d1h
label-bot-worker-6cf79d54b9-b9zw7         1/1     Running             0          7d1h
label-bot-worker-6cf79d54b9-bwpzb         1/1     Running             0          7d1h
label-bot-worker-6cf79d54b9-drr8s         1/1     Running             0          7d18h
label-bot-worker-6cf79d54b9-dvbgb         0/1     Evicted             0          24d
label-bot-worker-6cf79d54b9-f4tw9         0/1     Evicted             0          24d
label-bot-worker-6cf79d54b9-r478z         1/1     Running             0          7d16h
ml-github-app-6d95bc4448-6bgpm            0/1     ContainerCreating   0          80m
ml-github-app-6d95bc4448-fg55j            0/1     ContainerCreating   0          80m
ml-github-app-6d95bc4448-fvq27            0/1     ContainerCreating   0          80m
ml-github-app-6d95bc4448-j7flv            0/1     ContainerCreating   0          80m
ml-github-app-6d95bc4448-lp57m            0/1     ContainerCreating   0          80m
ml-github-app-6d95bc4448-rw9k9            0/1     ContainerCreating   0          80m
ml-github-app-6d95bc4448-tc5md            0/1     ContainerCreating   0          80m
ml-github-app-d9f449fc7-27z8g             0/1     ContainerCreating   0          30s
ml-github-app-d9f449fc7-fx5r9             0/1     ContainerCreating   0          30s
ml-github-app-d9f449fc7-hmhjm             0/1     ContainerCreating   0          30s
ml-github-app-d9f449fc7-kl62w             0/1     ContainerCreating   0          30s
ml-github-app-d9f449fc7-qxf6x             0/1     ContainerCreating   0          30s

@jlewi
Copy link
Collaborator

jlewi commented May 29, 2020

@hamelsmu What command did you run to deploy?

I think we are looking at two different Kubernetes clusters. I think the front end (just the flask app) and the backend (embedding server) should now be running on two different clusters.

Here's how we can verify. First lets figure out where https://label-bot-prod.mlbot.net is pointing

nslookup label-bot-prod.mlbot.net
34.102.212.79

Lets check if we have an ingress for that ip address

kubectl --context=label-bot-frontend-prod get ingress
NAME                 HOSTS   ADDRESS         PORTS   AGE
label-bot-frontend   *       34.102.212.79   80      132d

Now we we can check which k8s sevice that ingress is pointing to

kubectl --context=${CONTEXT} get ingress -o yaml

...
    rules:
    - http:
        paths:
        - backend:
            serviceName: label-bot-ml-github-app
            servicePort: 3000
          path: /

So its pointing at service label-bot-ml-github-app. Now we can check what the service selector is to identify which pods it matches

kubectl --context=${CONTEXT} describe service label-bot-ml-github-app
...
Selector:                 app=label-bot,environment=prod,service=label-bot

We can compare the selector to the labels on a given deployment

kubectl --context=label-bot-frontend-prod describe deploy

Name:                   label-bot-ml-github-app
Namespace:              label-bot-prod
CreationTimestamp:      Fri, 17 Jan 2020 15:19:16 -0800
Labels:                 app=label-bot
                             environment=prod
                             service=label-bot


 Environment:
      DATABASE_URL:                    <set to the key 'DATABASE_URL' in secret 'ml-app-inference-secret'>    Optional: false
      WEBHOOK_SECRET:                  <set to the key 'WEBHOOK_SECRET' in secret 'ml-app-inference-secret'>  Optional: false
      APP_ID:                          27079
      GITHUB_APP_PEM_KEY:              /var/secrets/github/kf-label-bot-prod.private-key.pem
...

So the above confims that I'm looking at the pods that should map to the IP address. It also looks the environment variable GITHUB_APP_PEM_KEY is pointing to the incorrect value.

So it looks to me like we are looking at different clusters (because your namespace isn't showing those pods).

We can check which cluster by doing cluster info

kubectl --context=label-bot-frontend-prod cluster-info
Kubernetes master is running at https://35.237.184.5

If the IP address for your master is different you are talking to a different kubernetes cluster.

You can get a mapping of kubectl contexts to kubeconfig using

kubectl config get-contexts

So the full cluster name is

gke_github-probots_us-east1-d_kf-ci-ml

So I'm looking at

  • project = github-probots
  • zone = us-east1-d
  • cluster= kf-ci-ml

@hamelsmu
Copy link
Contributor Author

I can confirm that I'm using the cluster gke_github-probots_us-east1-d_kf-ci-ml however when I run the command

kubectl --context=label-bot-frontend-prod describe deploy

I get Error in configuration: context was not found for specified context: label-bot-frontend-prod

Not sure how to fix this, I went to the gcloud console and used connect on the kf-ci-ml cluster

image

@hamelsmu
Copy link
Contributor Author

friendly bump

@jlewi
Copy link
Collaborator

jlewi commented Jun 2, 2020

@hamelsmu sorry missed this. Contexts are convenient ways to name your K8s configurations. label-bot-frontend-prod is what I named my context.

What I do is

  1. run gcloud container cluster get-credentials...
  2. Use this script to give my context a more friendly name and set the namespace.

@hamelsmu
Copy link
Contributor Author

hamelsmu commented Jun 2, 2020

Ahh I ssh'd into one of the pods and saw that the path is /var/secrets/github/issue-label-bot-github-app.private-key.pem NOT /var/secrets/github-app/issue-label-bot-github-app.private-key.pem so I changed this value accordingly.

I could not figure out how to change the deployment.yaml file in this repo, so I did the hack of kubectl -n label-bot-prod edit deploy label-bot-ml-github-app and changed it in place.

I tried to edit deployment/overlays/prod/deployment.yaml and apply that but I get this error

kubectl -n label-bot-prod apply -f deployment/overlays/prod/deployment.yaml                                                                                                           

error: error validating "deployment/overlays/prod/deployment.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false

@jlewi can you direct me to right yaml file for this, or change this in a PR? Thanks I can confirm the issue is currently fixed.

@hamelsmu
Copy link
Contributor Author

hamelsmu commented Jun 2, 2020

@jlewi Second question, are the YAML files in this repo up to date? the deployment.yaml files do not match what is currently running in our cluster, for example I cannot find any yaml files with the deployment label-bot-ml-github-app which is currently running in our cluster

@jlewi
Copy link
Collaborator

jlewi commented Jun 3, 2020

The answer to the first question is you need use kustomize to build the manifests.
https://kustomize.io/

so

kustomize build ./deployment/overlays/prod | kubectl apply -f  -

@jlewi Second question, are the YAML files in this repo up to date? the deployment.yaml files do not match what is currently running in our cluster

The answer again is kustomize. Compare the output of kustomize build and the names should match. If look at the kustomization file

We use kustomize to add a prefix to all resource names. Thats where the "label-bot-" prefix comes from.

A systematic way to address this would be to

  1. Check into git hydrated manifests (i.e. the output of kustomize build)
  2. Use ACM to automatically sync the resources to the git repo.

@hamelsmu
Copy link
Contributor Author

hamelsmu commented Jun 4, 2020

Thanks, I won't go into kustomize but at least this provides a starting point for me to try to do this. closing issue since this is fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants