-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nat-gateway.sh init not exec after k8s cluster reboot #3241
Comments
k8s 1.24.8 + kubeovn 1.12.0 + ubunutu 18.0.4 +3 node cluster |
as the log shown: failed to ExecuteCommandInContainer, stdOutput: ext-subnet-route-add 172.56.0.0/24,172.56.0.1
nat gateway not inited
the nat gw is not inited, so no rule in the nat gw pod. |
I do not know why the route add failed in your env bash /kube-ovn/nat-gateway.sh ext-subnet-route-add 172.56.0.0/24,172.56.0.1
I think you should check the cmd in the pod manually |
it can exec command in vpc--nat-gateway pod in which kube-ovn-controller notify vpc-nat-gateway pod create event. |
The nat gw init should be executed just in the process of the pod running, this could fix the issue: remove the |
@wenwenxiong Hi, what do you think about this, do you have the free time to try it? |
maybe the kube-ovn-controller started should check the nat gw pod has been recreated, and trigger all nats re-creation the nat created time should behind the pod creationTime,if not ,should recreate |
how to reslove it ? |
After the kube-ovn-controller restarted, should make sure all the nats have been recreated after the creation of the nat gw pod, if not, trigger all nats belonging to the nat gw pod to re-create. |
it seems hard for me to do this |
can you fix it ? |
i will try later |
这个目前把路由移到 init 中是有提升的,但是需要配合后续的 kube-ovn-controller 需要判断iptables nats 的时间是否早于 nat gw pod 的创建时间,这种 nat 需要触发重建。 |
@wenwenxiong Hi, i have a pr maybe could cover the issue (nats not recoverd), but I do not really know how you test this. can you help make a confirmation? the pr is: https://github.com/kubeovn/kube-ovn/pull/3261/files |
i found it has not this issue in ubuntu22.04 os, it is look likes iptables version diff in host os and vpc-nat-gateway container os lead this.
in ubuntu 22.04 iptables 1.8.7 (vpc-gate-way pod iptables 1.8.9) it is normal some related docker kubernetes issue |
it seems work in ubuntu 22.04 os.
|
Thanks for your help |
Expected Behavior
vpc_nat_gateway pod running normal for nat rules exec after reboot whole k8s cluster node
Actual Behavior
vpc_nat_gateway pod all iptables rules disappear after reboot whole k8s cluster node
Steps to Reproduce the Problem
Additional Info
Kubernetes version:
Output of
kubectl version
:kube-ovn version:
operation-system/kernel version:
Output of
awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
:Output of
uname -r
:kube-ovn-controller logs :
I0920 17:37:38.452561 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:37:38.457852 1 node.go:752] start to check gateway status
I0920 17:37:41.604881 1 vpc_nat_gateway.go:603] handle update subnet route for nat gateway gw1
I0920 17:37:41.605216 1 vpc_nat_gateway.go:719] bash /kube-ovn/nat-gateway.sh ext-subnet-route-add 172.56.0.0/24,172.56.0.1
I0920 17:37:41.683340 1 vpc_nat_gateway.go:727] failed to ExecuteCommandInContainer, stdOutput: ext-subnet-route-add 172.56.0.0/24,172.56.0.1
nat gateway not inited
E0920 17:37:41.683393 1 vpc_nat_gateway.go:729] command terminated with exit code 1
E0920 17:37:41.683414 1 vpc_nat_gateway.go:630] failed to exec nat gateway rule, err: command terminated with exit code 1
E0920 17:37:41.683654 1 vpc_nat_gateway.go:197] process: updateVpcSubnet. err: error syncing 'gw1': failed to exec nat gateway rule, err: command terminated with exit code 1, requeuing
I0920 17:37:42.361659 1 network_attachment.go:66] parsePodNetworkAnnotation: [{"interface":"podefb02104595","name":"net1-3","namespace":"default"}], default
I0920 17:37:42.368780 1 network_attachment.go:66] parsePodNetworkAnnotation: [{"interface":"pod774883e771f","name":"net1","namespace":"default"}], default
I0920 17:37:42.379176 1 network_attachment.go:66] parsePodNetworkAnnotation: kube-system/ovn-vpc-external-network, kube-system
I0920 17:37:42.379197 1 network_attachment.go:21] parsePodNetworkObjectName: kube-system/ovn-vpc-external-network
I0920 17:37:43.453296 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:37:43.458556 1 node.go:752] start to check gateway status
I0920 17:37:48.453692 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:37:48.459161 1 node.go:752] start to check gateway status
I0920 17:37:53.454383 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:37:53.459781 1 node.go:752] start to check gateway status
I0920 17:37:58.142309 1 provider-network.go:16] start to sync ProviderNetwork status
I0920 17:37:58.454521 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:37:58.460799 1 node.go:752] start to check gateway status
I0920 17:37:59.095805 1 node.go:942] start to check node port-group status
I0920 17:37:59.096066 1 network_attachment.go:66] parsePodNetworkAnnotation: [{"interface":"pod774883e771f","name":"net1","namespace":"default"}], default
I0920 17:37:59.106064 1 network_attachment.go:66] parsePodNetworkAnnotation: [{"interface":"podefb02104595","name":"net1-3","namespace":"default"}], default
I0920 17:38:01.308490 1 node.go:68] enqueue update node master2
I0920 17:38:01.308673 1 node.go:594] handle update node master2
I0920 17:38:01.684658 1 vpc_nat_gateway.go:603] handle update subnet route for nat gateway gw1
I0920 17:38:01.684983 1 vpc_nat_gateway.go:719] bash /kube-ovn/nat-gateway.sh ext-subnet-route-add 172.56.0.0/24,172.56.0.1
I0920 17:38:01.773061 1 vpc_nat_gateway.go:727] failed to ExecuteCommandInContainer, stdOutput: ext-subnet-route-add 172.56.0.0/24,172.56.0.1
nat gateway not inited
E0920 17:38:01.773185 1 vpc_nat_gateway.go:729] command terminated with exit code 1
E0920 17:38:01.773226 1 vpc_nat_gateway.go:630] failed to exec nat gateway rule, err: command terminated with exit code 1
E0920 17:38:01.773370 1 vpc_nat_gateway.go:197] process: updateVpcSubnet. err: error syncing 'gw1': failed to exec nat gateway rule, err: command terminated with exit code 1, requeuing
I0920 17:38:02.389130 1 network_attachment.go:66] parsePodNetworkAnnotation: [{"interface":"podefb02104595","name":"net1-3","namespace":"default"}], default
I0920 17:38:02.397445 1 network_attachment.go:66] parsePodNetworkAnnotation: [{"interface":"pod774883e771f","name":"net1","namespace":"default"}], default
I0920 17:38:02.407975 1 network_attachment.go:66] parsePodNetworkAnnotation: kube-system/ovn-vpc-external-network, kube-system
I0920 17:38:02.407994 1 network_attachment.go:21] parsePodNetworkObjectName: kube-system/ovn-vpc-external-network
I0920 17:38:03.455243 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:38:03.461586 1 node.go:752] start to check gateway status
I0920 17:38:05.880274 1 node.go:68] enqueue update node master1
I0920 17:38:05.880326 1 node.go:594] handle update node master1
I0920 17:38:06.776480 1 node.go:68] enqueue update node master3
I0920 17:38:06.776616 1 node.go:594] handle update node master3
I0920 17:38:08.456118 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:38:08.462390 1 node.go:752] start to check gateway status
I0920 17:38:13.456578 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:38:13.462839 1 node.go:752] start to check gateway status
I0920 17:38:18.457453 1 vpc_dns.go:520] the vpc-dns configuration is not set
I0920 17:38:18.463780 1 node.go:752] start to check gateway status
I0920 17:38:21.774577 1 vpc_nat_gateway.go:603] handle update subnet route for nat gateway gw1
The text was updated successfully, but these errors were encountered: