Skip to content

container to container communication through service #2056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elkh510 opened this issue Jan 13, 2021 · 9 comments
Closed

container to container communication through service #2056

elkh510 opened this issue Jan 13, 2021 · 9 comments

Comments

@elkh510
Copy link

elkh510 commented Jan 13, 2021

What happened:
if we have a pod that has two containers (like a client and a server) and the client tries to connect to the server through the service, the client will get an error and won't be able to connect. if the client tries to connect to the server via localhost, everything works as expected.
the bug has been reproduced on aks v1.19.3.
on aks v1.18.10 the error is not reproducible

What you expected to happen:
if we have a pod that has two containers (like a client and a server) and the client tries to connect to the server through the service, the client can success connect to server.

How to reproduce it (as minimally and precisely as possible):
deploy test deployment. can be found at this link
from client try to curl(telnet) server via service

Anything else we need to know?:
maybe related to
kubernetes/kubernetes#94754
kubernetes/kubernetes#95409
https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.4/+bug/1899690
Environment:

  • Kubernetes version v1.19.3:
  • Size of cluster 10
@ghost ghost added the triage label Jan 13, 2021
@ghost
Copy link

ghost commented Jan 13, 2021

Hi elkh510, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

@ghost ghost added the action-required label Jan 15, 2021
@ghost
Copy link

ghost commented Jan 15, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented Jan 20, 2021

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jan 20, 2021
@joaguas
Copy link

joaguas commented Feb 4, 2021

Hi @elkh510 ,
Is this issue still happening? I'm unable to reproduce it either with azure cni or kubenet.

image

If this is still happening can you confirm that name resolution is working for the service (assuming you're not using svc ip) and that iptables rules are correctly translating service to pod?

IP=zzz.zzz.zzz.zzz
ID=$(iptables-save | grep $IP | grep SVC | awk -F '-' '{print $NF}')
for SEP in $(iptables-save | grep $ID | grep SEP | awk -F '-' '{print $NF}'); do iptables-save | grep "/32"| grep $SEP; done

@elkh510
Copy link
Author

elkh510 commented Feb 9, 2021

hi @joaguas

Is this issue still happening?

yes

can you confirm that name resolution is working for the service

yes(screen below)
image

iptables rules are correctly translating service to pod?

yes, as i understand
image

@eriksywu
Copy link
Contributor

Hi @elkh510

Based on your pod ip I assume this is a kubenet cluster? Do you know how old the cluster is? There was a typo we fixed and rolled out in early Jan that was preventing proper network hairpinning.
https://github.com/Azure/AgentBaker/pull/503/files

Check /etc/cni/net.d/10-containerd.conflist. If the typo is there (promisMode, should be promiscMode) then you can do a node image upgrade to pick up the latest image with the fix.

@ghost ghost removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Feb 12, 2021
@ghost ghost removed the triage label Feb 12, 2021
@elkh510
Copy link
Author

elkh510 commented Feb 15, 2021

hi @eriksywu
yes, update to 1.19.7 fixed the problem.

@xuto2 xuto2 closed this as completed Feb 17, 2021
@xuto2
Copy link
Contributor

xuto2 commented Feb 17, 2021

closing since the fix is rolled out

@xuto2
Copy link
Contributor

xuto2 commented Feb 17, 2021

to clarify - this is not related to kubernetes/kubernetes#94754

@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants