Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to enable kube-ovn: coredns still make reference to calico #5012

Open
gaetanquentin opened this issue Apr 5, 2025 · 3 comments · May be fixed by canonical/microk8s-core-addons#337
Open

Comments

@gaetanquentin
Copy link

gaetanquentin commented Apr 5, 2025

Summary

enabling kube-ovn do not work: coredns ko

Versions and config

ubuntu 24.04.2
MicroK8s v1.32.3 revision 7964

filesystems:
/ : btrfs
/microk8s/xfs/ : xfs
/microk8s/btrfs/ : btrfs

/dev/mapper/ubuntu--vg-ubuntu--lv--btrfs on / type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=5,subvol=/)
/dev/mapper/ubuntu--vg-ubuntu--lv--xfs on /microk8s/xfs type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/dev/mapper/ubuntu--vg-ubuntu--lv--btrfs on /microk8s/xfs/microk8s/io.containerd.snapshotter.v1.btrfs type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=557,subvol=/microk8s/btrfs/snapshotter)

/var/snap/microk8s/current/args/containerd:
--root /microk8s/xfs/microk8s

/var/snap/microk8s/current/args/containerd-template.toml:
[plugins."io.containerd.grpc.v1.cri".containerd]

# snapshotter is the snapshotter used by containerd.
snapshotter = "btrfs"

fstab:
/dev/disk/by-id/dm-uuid-LVM-vE23bD62zX28eJCFvac8QqVV9DuK5leZB288JscoI30So8sY6rDdDBhA1ApMBse6 /microk8s/xfs xfs defaults 0 1
/microk8s/btrfs/snapshotter /microk8s/xfs/microk8s/io.containerd.snapshotter.v1.btrfs none bind 0 0

sudo btrfs sub list /
ID 285 gen 1859 top level 5 path data-btrfs-compressed
ID 557 gen 34825 top level 5 path microk8s/btrfs/snapshotter
ID 558 gen 7279 top level 5 path microk8s/btrfs/registry
ID 559 gen 7279 top level 5 path microk8s/btrfs/data
ID 859 gen 34817 top level 557 path microk8s/btrfs/snapshotter/snapshots/1
ID 865 gen 32829 top level 557 path microk8s/btrfs/snapshotter/snapshots/6
ID 867 gen 32832 top level 557 path microk8s/btrfs/snapshotter/snapshots/5
ID 870 gen 32834 top level 557 path microk8s/btrfs/snapshotter/snapshots/7
ID 872 gen 32836 top level 557 path microk8s/btrfs/snapshotter/snapshots/9
ID 874 gen 32838 top level 557 path microk8s/btrfs/snapshotter/snapshots/8
ID 876 gen 32841 top level 557 path microk8s/btrfs/snapshotter/snapshots/12

What Should Happen Instead?

enabling kube-ovn addon should remove calico completly, and kube ovn should deliver ips to pods.

Reproduction Steps

  1. sudo snap install microk8s --classic
  2. microk8s config to ket kubeconfig
  3. sudo microk8s enable community
  4. sudo microk8s enable kube-ovn --force
  5. microk8s kubectl get pods -n kube-system
    -> unable to delete calico pods
    kube-system 0s Warning FailedKillPod pod/calico-kube-controllers-5947598c79-srbvc error killing pod: failed to "KillPodSandbox" for "a1fbccc4-a2ac-4156-b441-94be51fcb865" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox "a769a15b0f787289f897c0927c9649853ba8f36e1912a9889eb9506b4d3386d7": plugin type="

+lots of iptables rules caliXXX
+vxlan net link still here
= 6. reboot
7. sudo microk8s enable kube-ovn --force , again
8. kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-79b94494c7-sgv4x 0/1 ContainerCreating 2 3h30m
kube-system ovn-central-687b87db67-5kt89 1/1 Running 5 95m
kube-system ovs-ovn-vtk7w 1/1 Running 5 95m

  1. journalctl:
    Apr 05 20:38:39 node3 microk8s.daemon-kubelite[9104]: E0405 20:38:39.750701 9104 kuberuntime_manager.go:1546] "Failed to stop sandbox" podSandboxID={"Type":"containerd","ID":"33a6172891b53d11605ed0791f96f75a876cd54a88d005987866124d7345d124"}
    Apr 05 20:38:39 node3 microk8s.daemon-kubelite[9104]: E0405 20:38:39.750748 9104 kuberuntime_manager.go:1146] "killPodWithSyncResult failed" err="failed to "KillPodSandbox" for "c5fc8889-4854-43b7-83f2-e1e838f04297" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"33a6172891b53d11605ed0791f96f75a876cd54a88d005987866124d7345d124\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized""

Introspection Report

inspection-report-20250405_201029.tar.gz

Can you suggest a fix?

no

Are you interested in contributing with a fix?

no

@gaetanquentin
Copy link
Author

kubectl delete pods -n kube-system coredns-79b94494c7-sgv4x
-> pod "coredns-79b94494c7-sgv4x" deleted
but don't give back control

kube-system 9s Warning FailedKillPod pod/coredns-79b94494c7-sgv4x error killing pod: failed to "KillPodSandbox" for "c5fc8889-4854-43b7-83f2-e1e838f04297" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox "33a6172891b53d11605ed0791f96f75a876cd54a88d005987866124d7345d124": plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
kube-system 70s Normal SuccessfulCreate replicaset/coredns-79b94494c7 Created pod: coredns-79b94494c7-c442s

kube-system 0s Warning FailedKillPod pod/coredns-79b94494c7-sgv4x error killing pod: failed to "KillPodSandbox" for "c5fc8889-4854-43b7-83f2-e1e838f04297" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox "33a6172891b53d11605ed0791f96f75a876cd54a88d005987866124d7345d124": plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
kube-system 0s Normal SandboxChanged pod/coredns-79b94494c7-c442s Pod sandbox changed, it will be killed and re-created.
kube-system 0s Normal SandboxChanged pod/coredns-79b94494c7-c442s Pod sandbox changed, it will be killed and re-created.
kube-system 0s Warning FailedKillPod pod/coredns-79b94494c7-sgv4x error killing pod: failed to "KillPodSandbox" for "c5fc8889-4854-43b7-83f2-e1e838f04297" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox "33a6172891b53d11605ed0791f96f75a876cd54a88d005987866124d7345d124": plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"

@gaetanquentin
Copy link
Author

did it again but deleted all calico file before:

  1. sudo rm -f /var/snap/microk8s/current/args/cni-network/
  2. sudo microk8s stop
  3. sudo microk8s start
  4. sudo microk8s enable kube-ovn --force
  5. kubectl get all -A -o wide
    NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    default pod/nginx-5869d7778c-2fwlg 0/1 Pending 0 46m
    kube-system pod/coredns-79b94494c7-8zhxd 0/1 Pending 0 46m
    kube-system pod/kube-ovn-cni-lr98f 1/1 Running 0 46m 172.16.99.105 node3
    kube-system pod/kube-ovn-controller-68fd567f9b-fdtdc 1/1 Running 0 46m 172.16.99.105 node3
    kube-system pod/kube-ovn-monitor-8b766f98f-jknq8 1/1 Running 0 46m 172.16.99.105 node3
    kube-system pod/ovn-central-687b87db67-5kt89 1/1 Running 6 3h59m 172.16.99.105 node3
    kube-system pod/ovs-ovn-vtk7w 1/1 Running 6 3h59m 172.16.99.105 node3

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.152.183.1 443/TCP 5h54m
kube-system service/kube-dns ClusterIP 10.152.183.10 53/UDP,53/TCP,9153/TCP 5h54m k8s-app=kube-dns
kube-system service/kube-ovn-cni ClusterIP 10.152.183.136 10665/TCP 46m app=kube-ovn-cni
kube-system service/kube-ovn-controller ClusterIP 10.152.183.200 10660/TCP 46m app=kube-ovn-controller
kube-system service/kube-ovn-monitor ClusterIP 10.152.183.168 10661/TCP 46m app=kube-ovn-monitor
kube-system service/kube-ovn-pinger ClusterIP 10.152.183.43 8080/TCP 46m app=kube-ovn-pinger
kube-system service/ovn-nb ClusterIP 10.152.183.218 6641/TCP 4h app=ovn-central,ovn-nb-leader=true
kube-system service/ovn-northd ClusterIP 10.152.183.151 6643/TCP 4h app=ovn-central,ovn-northd-leader=true
kube-system service/ovn-sb ClusterIP 10.152.183.231 6642/TCP 4h app=ovn-central,ovn-sb-leader=true

NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kube-ovn-cni 1 1 1 1 1 kubernetes.io/os=linux 46m cni-server docker.io/kubeovn/kube-ovn:v1.12.21 app=kube-ovn-cni
kube-system daemonset.apps/kube-ovn-pinger 0 0 0 0 0 kubernetes.io/os=linux 46m pinger docker.io/kubeovn/kube-ovn:v1.12.21 app=kube-ovn-pinger
kube-system daemonset.apps/ovs-ovn 1 1 1 1 1 kubernetes.io/os=linux 4h openvswitch docker.io/kubeovn/kube-ovn:v1.12.21 app=ovs

NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
default deployment.apps/nginx 0/1 1 0 63m nginx nginx app=nginx
kube-system deployment.apps/coredns 0/1 1 0 5h54m coredns coredns/coredns:1.10.1 k8s-app=kube-dns
kube-system deployment.apps/kube-ovn-controller 1/1 1 1 46m kube-ovn-controller docker.io/kubeovn/kube-ovn:v1.12.21 app=kube-ovn-controller
kube-system deployment.apps/kube-ovn-monitor 1/1 1 1 46m kube-ovn-monitor docker.io/kubeovn/kube-ovn:v1.12.21 app=kube-ovn-monitor
kube-system deployment.apps/ovn-central 1/1 1 1 4h ovn-central docker.io/kubeovn/kube-ovn:v1.12.21 app=ovn-central

NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
default replicaset.apps/nginx-5869d7778c 1 1 0 63m nginx nginx app=nginx,pod-template-hash=5869d7778c
kube-system replicaset.apps/coredns-79b94494c7 1 1 0 5h54m coredns coredns/coredns:1.10.1 k8s-app=kube-dns,pod-template-hash=79b94494c7
kube-system replicaset.apps/kube-ovn-controller-68fd567f9b 1 1 1 46m kube-ovn-controller docker.io/kubeovn/kube-ovn:v1.12.21 app=kube-ovn-controller,pod-template-hash=68fd567f9b
kube-system replicaset.apps/kube-ovn-monitor-8b766f98f 1 1 1 46m kube-ovn-monitor docker.io/kubeovn/kube-ovn:v1.12.21 app=kube-ovn-monitor,pod-template-hash=8b766f98f
kube-system replicaset.apps/ovn-central-687b87db67 1 1 1 3h59m ovn-central docker.io/kubeovn/kube-ovn:v1.12.21 app=ovn-central,pod-template-hash=687b87db67

more ovn/ovs pods!

but cni not initialized:
journalctl:
Apr 05 22:58:44 node3 microk8s.daemon-kubelite[499000]: E0405 22:58:44.093162 499000 kubelet.go:3002] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Apr 05 22:58:49 node3 microk8s.daemon-kubelite[499000]: E0405 22:58:49.095130 499000 kubelet.go:3002] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

on the host, ovn/ovs links appaeared:

10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 9a:13:cd:cc:7c:30 brd ff:ff:ff:ff:ff:ff
11: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000
link/ether 9e:bd:a7:ef:77:23 brd ff:ff:ff:ff:ff:ff
13: mirror0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ee:f6:90:90:1f:99 brd ff:ff:ff:ff:ff:ff
14: ovn0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether da:96:c2:10:f2:f5 brd ff:ff:ff:ff:ff:ff
inet 100.64.0.2/16 brd 100.64.255.255 scope global ovn0
valid_lft forever preferred_lft forever

@claudiubelu
Copy link
Contributor

Hello,

I've also seen this issue as well, as I've been testing kube-ovn, and I did a bit more investigation on what happens and why.

First of all, the issue does not occur because coredns still references calico, the install.sh script (which is mirrored from upstream kube-ovn) basically does a rolling upgrade of coredns, meaning a new Pod should spawn with the new CNI.

Secondly, indeed, enabling the addon gets stuck and fails because of the error you mentioned, that it cannot terminate calico-kube-controllers:

error killing pod: failed to "KillPodSandbox" for "d0d7c456-1f77-4d3f-abf2-4148a9886ba9" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"ddab4cab8c404658300482288f44c488b02d55365fb054484150189a866b717c\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"

That kind of looks like a permissions error, which may be because of the fact that its serviceaccount / cluster role / cluster role binding got deleted before it. Removing it before will allow this to be properly removed.

Thirdly, it seems some configuration is missing, which is why it still thinks it has the Calico CNI, and why the nodes are in a Not Ready state due to the CNI. I may have found what that config is, currently testing it a bit more, but will send some PRs addressing these issues.

@claudiubelu claudiubelu linked a pull request Apr 7, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants