Skip to content

Ability to set pod MTU separate from ENI MTU (or eth0) #2606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
archoversight opened this issue Oct 6, 2023 · 16 comments
Closed

Ability to set pod MTU separate from ENI MTU (or eth0) #2606

archoversight opened this issue Oct 6, 2023 · 16 comments

Comments

@archoversight
Copy link

archoversight commented Oct 6, 2023

What would you like to be added:

I'd like to have the ability to set the MTU for my pod's virtual interfaces to a lower MTU than the MTU for my eth0. I am running an IPv6 only EKS cluster, and attempting to deploy Cillium on top with Wireguard encryption so that pod to pod traffic is transparently encrypted.

Unfortunately the overhead from the Wireguard tunnel and lack of path MTU means that currently traffic is silently dropped if it tries to send packets of MTU 9001 when the Wireguard tunnel is set to 8921 MTU.

I tried setting AWS_VPC_ENI_MTU to 8000 as an example, but it seems to also change the MTU when starting for eth0 which is not what I want.

Why is this needed:

When chaining CNI's that add transparent encryption or encapsulation on IPv6 hosts where path MTU does not function, it would be nice to have an escape hatch.

References:

cilium/cilium#28413 (comment)
cilium/cilium#28387
https://aws.amazon.com/blogs/containers/transparent-encryption-of-node-to-node-traffic-on-amazon-eks-using-wireguard-and-cilium/

@jdn5126
Copy link
Contributor

jdn5126 commented Oct 6, 2023

@archoversight setting AWS_VPC_ENI_MTU should set the MTU for the pod's virtual interfaces. eth0 is the name of the pod veth endpoint in the pod networking namespace. Are you not seeing the MTU get set on that interface?

@archoversight
Copy link
Author

archoversight commented Oct 6, 2023

@archoversight setting AWS_VPC_ENI_MTU should set the MTU for the pod's virtual interfaces. eth0 is the name of the pod veth endpoint in the pod networking namespace. Are you not seeing the MTU get set on that interface?

I am using EKS in IPv6 mode, eth0 (host) has a prefix delegated to it. I am looking at the host (pod launched with kubectl debug) and it is showing that eth0's MTU is changing, not just the veths that are created.

I am seeing the MTU set on the veth interfaces correctly, but I also see the MTU change on eth0.

@mmerickel
Copy link

mmerickel commented Oct 6, 2023

@jdn5126 we're referring to the host eth0, not in a pod. It's true that in the pods the primary interface shows up as eth0.

@jdn5126
Copy link
Contributor

jdn5126 commented Oct 9, 2023

The host eth0 is the primary ENI, which the VPC CNI does not manage. If you want to change the MTU on the primary ENI, you would need to do so in the AMI, or in the node group template

@mmerickel
Copy link

The host eth0 is the primary ENI, which the VPC CNI does not manage. If you want to change the MTU on the primary ENI, you would need to do so in the AMI, or in the node group template

Have you tried this? This issue is saying the opposite and that it is controlling the primary ENI - I've observed it myself separately from @archoversight on my own cluster. Modify the config setting in the addon, then start up new nodes, and the primary eth0 eni on the host now has the new MTU value along with the pods that start on that node.

@jdn5126
Copy link
Contributor

jdn5126 commented Oct 10, 2023

@mmerickel I am under the impression that the MTU for the primary ENI should not change. I have not tried this recently, so I will try to test this out next week

@jdn5126
Copy link
Contributor

jdn5126 commented Oct 24, 2023

Sorry for the delay, I got caught up with other issues. Planning to work on this tomorrow

@jdn5126 jdn5126 self-assigned this Oct 24, 2023
@jdn5126
Copy link
Contributor

jdn5126 commented Oct 25, 2023

@mmerickel I also see the behavior that you described, and I see the MTU for the primary ENI being set here in the code: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/networkutils/network.go#L300. This logic is called during aws-node pod initialization.

I am not sure why this is set for the primary ENI, though I am wondering if it was done to try to prevent IP fragmentation. If the MTU value is smaller on the pod veth than the primary ENI, that should not be a problem, though. The original PR has little documentation: #676.

Perhaps it was done for consistency? Adding @jayanthvn in case he has any opinion here

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Dec 25, 2023
@mmerickel
Copy link

/not-stale

@github-actions github-actions bot removed the stale Issue or PR is stale label Dec 27, 2023
@jdn5126
Copy link
Contributor

jdn5126 commented Jan 30, 2024

We discussed internally, and to support this enhancement, we would need a new environment variable. Currently, AWS_VPC_ENI_MTU is set on all ENIs and pod virtual interfaces. We cannot break existing behavior, so to support the pod virtual interfaces having a lower MTU than the ENIs, we would need a new environment variable like POD_MTU, which overrides AWS_VPC_ENI_MTU for pods only if set.

@jdn5126 jdn5126 assigned jchen6585 and unassigned jdn5126 Jan 30, 2024
@mmerickel
Copy link

This makes sense to me.

@jchen6585 jchen6585 mentioned this issue Feb 8, 2024
@jdn5126
Copy link
Contributor

jdn5126 commented Feb 15, 2024

Closing as #2791 has merged. This PR will ship in VPC CNI v1.16.4, which is targeting late Feb/early March

@jdn5126 jdn5126 closed this as completed Feb 15, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

@mmerickel
Copy link

Thanks @jdn5126!

@archoversight
Copy link
Author

Closing as #2791 has merged. This PR will ship in VPC CNI v1.16.4, which is targeting late Feb/early March

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants