failed to assign an IP address to container after enabling custom networking

**What happened**:

Every once in a while since I have set the vpc-cni add-on to use custom networking we're facing the following errors, even though there are enough IP addresses in the secondary subnets:
```
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c3b3c54fed84a015bf5a71d5f78ba19ad1ca7f2dcf73eb25fd3ff0dc7ff7c5ed": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
```

When running the following command on this node:
`kubectl describe node ip-172-32-1-101.ec2.internal | grep 'pods\|PrivateIPv4Address'`

We see that the node's capacity shows:
```
pods:                    58
```
However, when running:
`./max-pods-calculator.sh --instance-type c6i.2xlarge --cni-version "1.19.2-eksbuild.1" --cni-custom-networking-enabled`
The result shows that the maximum number of pods that should be able to run on this instance type is 44.

In /var/log/user_data.log of the affected node, the following configurations are visible:
```
INFO: --dns-cluster-ip='10.100.0.10'
2025-03-17T10:51:50+0000 [eks-bootstrap] INFO: --use-max-pods='false'
2025-03-17T10:51:50+0000 [eks-bootstrap] INFO: --kubelet-extra-args='--node-labels="component=entitle,karpenter.k8s.aws/ec2nodeclass=default,karpenter.sh/capacity-type=spot,karpenter.sh/nodepool=spot" --register-with-taints="karpenter.sh/unregistered:NoExecute" --max-pods=58'
2025-03-17T10:51:50+0000 [eks-bootstrap] INFO: Using kubelet version 1.31.5
2025-03-17T10:51:50+0000 [eks-bootstrap] INFO: Using containerd as the container runtime
Created symlink from /etc/systemd/system/multi-user.target.wants/sys-fs-bpf.mount to /etc/systemd/system/sys-fs-bpf.mount.
2025-03-17T10:51:51+0000 [eks-bootstrap] INFO: Using IP family: ipv4
```

The ipamd.log show errors like these:
```
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:607","msg":"AssignPodIPv4Address: IP address pool stats: total 42, assigned 42"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:607","msg":"AssignPodIPv4Address: ENI eni-04ddad5f92297dbb6 does not have available addresses"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:687","msg":"Get free IP from prefix failed no free IP available in the prefix - 100.64.220.11/ffffffff"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:607","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 100.64.220.11/ffffffff"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:687","msg":"Get free IP from prefix failed no free IP available in the prefix - 100.64.130.44/ffffffff"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:607","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 100.64.130.44/ffffffff"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:687","msg":"Get free IP from prefix failed no free IP available in the prefix - 100.64.139.212/ffffffff"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:607","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 100.64.139.212/ffffffff"}
{"level":"debug","ts":"2025-03-18T07:15:21.077Z","caller":"datastore/data_store.go:687","msg":"Get free IP from prefix failed no free IP available in the prefix - 100.64.216.129/ffffffff"}
```

These errors started after attempting to resolve an issue where IP addresses were exhausted in the subnets the cluster was using. The subnets were too small, and both nodes and pods were using the same CIDR range.
Secondary subnets were added using Terraform with the following configuration:
```
module "custom-netwotking" {
  source         = "../../modules/vpc-cni-custom-networking"
  cluster_name   = module.eks_cluster.eks_cluster_id
  secondary_cidr = "100.64.0.0/16"
  secondary_subnets = {
    us-east-1a = "100.64.0.0/17"
    us-east-1b = "100.64.128.0/17"
  }
}
```

The ENIConfig for the newly created subnets looks like this:
```
apiVersion: v1
items:
- apiVersion: crd.k8s.amazonaws.com/v1alpha1
  kind: ENIConfig
  metadata:
    annotations:
    generation: 1
    name: us-east-1a
  spec:
    securityGroups:
    - sg-07029
    subnet: subnet-075de
- apiVersion: crd.k8s.amazonaws.com/v1alpha1
  kind: ENIConfig
  metadata:
    annotations:
    generation: 1
    name: us-east-1b
  spec:
    securityGroups:
    - sg-07029
    subnet: subnet-095f8
kind: List
metadata:
  resourceVersion: ""
```

When I tried setting ENABLE_POD_ENI=true and DISABLE_TCP_EARLY_DEMUX=true and then restarted the nodes, I received the following errors while the nodes failed to remove:
```
  Normal  ControllerVersionNotice  7s (x14 over 49m)    vpc-resource-controller  The node is managed by VPC resource controller version v1.6.3
  Normal  NodeTrunkFailedInit      6s (x14 over 49m)    vpc-resource-controller  The node failed initializing trunk interface
```

**Environment**:
- Node Kubernetes version: v1.31.4-eks-aeac579
- CNI Version: 1.19.2-eksbuild.1
- Node AMI: amazon-eks-arm64-node-1.31-v20250123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

failed to assign an IP address to container after enabling custom networking #3238

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

failed to assign an IP address to container after enabling custom networking #3238

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions