Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 同节点默认网络同子网下pod不通,回包被物理网卡br设备劫持 #5058

Open
kldancer opened this issue Mar 6, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@kldancer
Copy link
Contributor

kldancer commented Mar 6, 2025

Kube-OVN Version

v.1.12.16

Kubernetes Version

v.1.27..6

Operation-system/Kernel Version

4.19.90-52.22.v2207.ky10.x86_64
kylin V10

Description

该环境下,默认网络下的pod域名解析失败,排查发现是无法访问coredns。进一步抓包同节点pod访问coredns发现,coredns成功收发数据包,但是回包没有发到源pod veth,发现发到了物理网络创建的br设备上。该现象很奇怪为偶现,通常创建出来的有些pod正常,有些pod就有这个问题。

创建物理网络的设备,bridge link show 输出来,hwmode为VEB模式,不知道是否不支持这种硬件桥接模式网卡作为underlay网卡、还是和操作系统有关?

Steps To Reproduce

Image

Image

Image

Image

Current Behavior

默认网络下的pod未能正常互通信

Expected Behavior

默认网络下的pod能正常互通信

@kldancer kldancer added the bug Something isn't working label Mar 6, 2025
Copy link

dosubot bot commented Mar 6, 2025

I found a similar closed issue that might be relevant to your problem: 在kube-ovn的vlan模式下,多个业务vlan彼此隔离,业务POD无法访问DNS服务. This issue was related to VLAN mode and DNS service access problems, and it was referenced by another issue regarding security group and subnet ACL priority problems [1].

For your specific issue, it seems like there might be a problem with the VEB mode of the hardware bridge being used as an underlay network card, or it could be related to the operating system. You might want to check if the VEB mode is supported in your setup or if there are any known limitations with using Kylin V10 as the operating system for Kube-OVN, although I couldn't find specific information on this [1].

Additionally, you could try troubleshooting by verifying network configurations and ensuring that the CoreDNS setup is correctly integrated into the network architecture. If the issue is intermittent, it might also be worth checking for any network card offload issues, as these can affect connectivity [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@kldancer kldancer changed the title [BUG] [BUG] 同节点默认网络同子网下pod不通,回包被物理网卡br设备劫持 Mar 6, 2025
@zhangzujian
Copy link
Member

看看 ovs-vswitchd 日志有没有报错。条件允许的话,可以试试换个内核版本。

@kldancer
Copy link
Contributor Author

kldancer commented Mar 6, 2025

看看 ovs-vswitchd 日志有没有报错。条件允许的话,可以试试换个内核版本。

ovs里未见报错,都是INFO级别的。

Image

@kldancer
Copy link
Contributor Author

kldancer commented Mar 6, 2025

看看 ovs-vswitchd 日志有没有报错。条件允许的话,可以试试换个内核版本。

ovs.log
导出了该节点流表,可以的话请帮忙看看流表上是否是否有什么异常。🙏

@oilbeater
Copy link
Collaborator

环境里是不是存在容器网络地址和物理网络地址冲突了?

@kldancer
Copy link
Contributor Author

环境里是不是存在容器网络地址和物理网络地址冲突了?

没有发现物理网络和容器网络地址冲突。这是当时环境上的接口信息。br-p3p1、p3p1、p3p2、bond1 设备mac是地址相同

ipinfo.log

@zhangzujian
Copy link
Member

看看 ovs-vswitchd 日志有没有报错。条件允许的话,可以试试换个内核版本。

ovs里未见报错,都是INFO级别的。

Image

这是 ovs pod 的日志。你得看节点上的 /var/log/openvswitch/ovs-vswitchd.log。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants