Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 跨主机容器无法互相访问,geneve设备上没有抓到回包 #4881

Closed
zghyy opened this issue Dec 30, 2024 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@zghyy
Copy link

zghyy commented Dec 30, 2024

Kube-OVN Version

v1.12.11

Kubernetes Version

[root@infra-arm-master0001 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14", GitCommit:"0f77da5bd4809927e15d1658fb4aa8f13ad890a5", GitTreeState:"clean", BuildDate:"2022-06-15T14:17:29Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14", GitCommit:"0f77da5bd4809927e15d1658fb4aa8f13ad890a5", GitTreeState:"clean", BuildDate:"2022-06-15T14:11:36Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/arm64"}

Operation-system/Kernel Version

[root@infra-arm-master0001 ~]# awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
"CentOS Linux 7 (AltArch)"
[root@infra-arm-master0001 ~]# uname -r
4.18.0-193.28.1.el7.aarch64

Description

跨主机的Pod无法互通,目前排查下来发现主机之间的geneve隧道这块可能存在问题,因为节点上ovn0网卡的join-cluster网络也不通,但是网关能通

Steps To Reproduce

当前环境有创建provider-networks,并创建了一个Subnet给kubevirt,用于分配underlay的IP。目前准备用ovn-default这个池来创建overlay的容器(非VM)使用

  1. 由于给kubevirt分配时通过net-attach-def指定了multus的资源,所以创建了新的net-attach-def
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: ovn-overlay
  namespace: default
spec:
  config: '{ "cniVersion": "0.3.0", "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
    "provider": "ovn-overlay.default.ovn" }'
  1. 创建了两个Deployment并指定其使用该net-attach-def来分配网络
以上省略...
template:
      metadata:
        annotations:
          ovn-overlay.default.ovn.kubernetes.io/logical_switch: ovn-default
          v1.multus-cni.io/default-network: default/ovn-overlay
        creationTimestamp: null
        labels:
          app: nginx-deployment
      spec:
        containers:
        - image: library/nginx:1.21
          imagePullPolicy: IfNotPresent
          name: nginx
以下部分省略....

Current Behavior

image 我准备使用ovn-default来创建容器使用时发现容器无法互相ping通。因为默认的网络使用了geneve隧道方案,所以尝试两台主机之间访问join-cluster这个网络也不通。(有尝试过抓对应的geneve网卡,没有抓到回包

Expected Behavior

节点之间可以通过join-cluster这个网络互通,节点上ovn-default分配的容器也能正常互相访问

@zghyy zghyy added the bug Something isn't working label Dec 30, 2024
Copy link

dosubot bot commented Dec 30, 2024

Hey @zghyy! I'm here to help you out with any bugs, questions, or contributions you have in mind. Let's work together to tackle this issue.

To resolve the issue where cross-host containers cannot communicate due to potential problems with the Geneve tunnel in a Kubernetes environment using Kube-OVN v1.12.11 on CentOS Linux 7, ensure that the Geneve tunnel traffic is not being blocked by netfilter. The kube_ovn_fastpath.c code snippet shows that Geneve tunnel traffic (UDP traffic on port 6081) is specifically handled to bypass netfilter processing:

if (IPPROTO_UDP == ip_header->protocol) {
    udp_header = (struct udphdr *)skb_transport_header(skb);
    if (ntohs(udp_header->dest) == 6081 || ntohs(udp_header->source) == 6081) {
        state->okfn(state->net, state->sk, skb);
        return NF_STOLEN;
    }
}

This code ensures that packets destined for or originating from port 6081 (the default port for Geneve) are processed directly without traversing the netfilter hooks, which could otherwise drop or alter the packets [1].

If cross-host communication is still failing, verify the following:

  1. Ensure that the kube_ovn_fastpath module is correctly loaded and active on all nodes.
  2. Check that there are no firewall rules or network policies blocking UDP traffic on port 6081.
  3. Confirm that the network interfaces and routes are correctly configured to allow Geneve traffic between the hosts.

If these conditions are met and the issue persists, further investigation into network configurations and logs may be necessary.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@oilbeater
Copy link
Collaborator

6081 udp 端口是不是没有放开,在这个端口两侧主机上抓包看一下

@zghyy
Copy link
Author

zghyy commented Dec 30, 2024

6081 udp端口是正常放通的,没有限制,我刚刚抓了下主网卡上的udp包
以下是发送方的包,看着是没啥问题,源目的地址外面套了udp的包
image
以下是接收方的包,看着有点怪,好像没有解析出包内的地址?
image

@zghyy
Copy link
Author

zghyy commented Dec 30, 2024

我按照文档中这部分的内容操作修复了这个问题。这里可能不止是麒麟的ARM会有这个问题,我这里Centos的ARM也会存在类似的问题 @oilbeater
https://kubeovn.github.io/docs/v1.12.x/ops/faq/#arm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants