container to container communication through service #2056

elkh510 · 2021-01-13T16:08:49Z

What happened:
if we have a pod that has two containers (like a client and a server) and the client tries to connect to the server through the service, the client will get an error and won't be able to connect. if the client tries to connect to the server via localhost, everything works as expected.
the bug has been reproduced on aks v1.19.3.
on aks v1.18.10 the error is not reproducible

What you expected to happen:
if we have a pod that has two containers (like a client and a server) and the client tries to connect to the server through the service, the client can success connect to server.

How to reproduce it (as minimally and precisely as possible):
deploy test deployment. can be found at this link
from client try to curl(telnet) server via service

Anything else we need to know?:
maybe related to
kubernetes/kubernetes#94754
kubernetes/kubernetes#95409
https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.4/+bug/1899690
Environment:

Kubernetes version v1.19.3:
Size of cluster 10

ghost · 2021-01-13T16:08:53Z

Hi elkh510, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
Please abide by the AKS repo Guidelines and Code of Conduct.
If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost · 2021-01-15T18:02:02Z

Triage required from @Azure/aks-pm

ghost · 2021-01-20T19:01:12Z

Action required from @Azure/aks-pm

joaguas · 2021-02-04T17:42:09Z

Hi @elkh510 ,
Is this issue still happening? I'm unable to reproduce it either with azure cni or kubenet.

If this is still happening can you confirm that name resolution is working for the service (assuming you're not using svc ip) and that iptables rules are correctly translating service to pod?

IP=zzz.zzz.zzz.zzz
ID=$(iptables-save | grep $IP | grep SVC | awk -F '-' '{print $NF}')
for SEP in $(iptables-save | grep $ID | grep SEP | awk -F '-' '{print $NF}'); do iptables-save | grep "/32"| grep $SEP; done

elkh510 · 2021-02-09T10:20:41Z

hi @joaguas

Is this issue still happening?

yes

can you confirm that name resolution is working for the service

yes(screen below)

iptables rules are correctly translating service to pod?

yes, as i understand

eriksywu · 2021-02-12T18:18:35Z

Hi @elkh510

Based on your pod ip I assume this is a kubenet cluster? Do you know how old the cluster is? There was a typo we fixed and rolled out in early Jan that was preventing proper network hairpinning.
https://github.com/Azure/AgentBaker/pull/503/files

Check /etc/cni/net.d/10-containerd.conflist. If the typo is there (promisMode, should be promiscMode) then you can do a node image upgrade to pick up the latest image with the fix.

elkh510 · 2021-02-15T08:34:21Z

hi @eriksywu
yes, update to 1.19.7 fixed the problem.

xuto2 · 2021-02-17T00:20:11Z

closing since the fix is rolled out

xuto2 · 2021-02-17T02:23:42Z

to clarify - this is not related to kubernetes/kubernetes#94754

ghost added the triage label Jan 13, 2021

ghost added the action-required label Jan 15, 2021

ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jan 20, 2021

ghost removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Feb 12, 2021

eriksywu added the networking/kubenet label Feb 12, 2021

ghost removed the triage label Feb 12, 2021

xuto2 closed this as completed Feb 17, 2021

ghost locked as resolved and limited conversation to collaborators Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

container to container communication through service #2056

container to container communication through service #2056

elkh510 commented Jan 13, 2021

ghost commented Jan 13, 2021

ghost commented Jan 15, 2021

ghost commented Jan 20, 2021

joaguas commented Feb 4, 2021

elkh510 commented Feb 9, 2021

eriksywu commented Feb 12, 2021

elkh510 commented Feb 15, 2021

xuto2 commented Feb 17, 2021

xuto2 commented Feb 17, 2021

container to container communication through service #2056

container to container communication through service #2056

Comments

elkh510 commented Jan 13, 2021

ghost commented Jan 13, 2021

ghost commented Jan 15, 2021

ghost commented Jan 20, 2021

joaguas commented Feb 4, 2021

elkh510 commented Feb 9, 2021

eriksywu commented Feb 12, 2021

elkh510 commented Feb 15, 2021

xuto2 commented Feb 17, 2021

xuto2 commented Feb 17, 2021