Skip to content

Commit 88c4669

Browse files
sushrkyash97
authored andcommitted
Update docs to add amazon-vpc-cni configmap options for SGPP and related troubleshooting (#359)
1 parent e4ac94b commit 88c4669

File tree

4 files changed

+51
-4
lines changed

4 files changed

+51
-4
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ The controller supports the following modes for IPv4 address management on Windo
4040

4141
Please follow this [guide](https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html) for enabling Windows Support on your EKS cluster.
4242

43+
## Configuring the controller via amazon-vpc-cni configmap
44+
45+
The controller supports various configuration options for managing security groups for pods and Windows nodes which can be set via the EKS-managed configmap `amazon-vpc-cni`. For more details, refer to the security group for pods configuration options [here](docs/sgp/sgp_config_options.md) and Windows IPAM/PD related configuration options [here](docs/windows/prefix_delegation_config_options.md)
46+
4347
## Troubleshooting
4448
For troubleshooting issues related to Security group for pods or Windows IPv4 address management, please visit our troubleshooting guide [here](docs/troubleshooting.md).
4549

docs/sgp/sgp_config_options.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Configuration options for Security groups for pods
2+
3+
Users are able to configure the controller functionality related to security group for pods by updating the `data` fields in EKS-managed configmap `amazon-vpc-cni`.
4+
5+
* **branch-eni-cooldown**: Cooldown period for the branch ENIs, the period of time to wait before deleting the branch ENI for propagation of iptables rules for the deleted pod. The default cooldown period is 60s, and the minimum value for the cool period is 30s. If user updates configmap to a lower value than 30s, this will be overridden and set to 30s.
6+
7+
Add `branch-eni-cooldown` field in the configmap to set the cooldown period, example:
8+
```
9+
apiVersion: v1
10+
data:
11+
branch-eni-cooldown: "60"
12+
kind: ConfigMap
13+
metadata:
14+
name: amazon-vpc-cni
15+
namespace: kube-system
16+
```

docs/sgp/workflow.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ Security Group for Pods is supported only on Nitro Based Instances.
77

88
![New Nitro Based Node Create Event Diagram](../images/sgp-node-create.png)
99

10-
1. User adds a new supported Node or enables ENI Trunking with existing nodes present in the cluster.
11-
2. VPC CNI Plugin adds label `vpc.amazonaws.com/has-trunk-attached: false` if the Node has capacity to create 1 additional ENI.
12-
3. Controller watches for Node events and acts on node with the above label by creating a Trunk ENI.
13-
4. Controller updates the resource capacity on this node to `vpc.amazonaws.com/pod-eni: # Supported Branch ENI`.
10+
1. User adds a new supported node or enables ENI Trunking with existing nodes present in the cluster.
11+
2. VPC CNI Plugin updates EKS-managed CRD `CNINode <NODE-NAME>` to add feature `SecurityGroupsForPods` if the node has capacity to create 1 additional ENI.
12+
3. Controller watches for node events and acts on node if the feature is added in `CNINode` CRD by creating a Trunk ENI.
13+
4. Controller updates the resource capacity on this node to `vpc.amazonaws.com/pod-eni: # Supported Branch ENI`. Controller also publishes an event on the node upon successful trunk ENI creation.
1414

1515
## Creating a Pod using Security Groups
1616

docs/troubleshooting.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414
- [Verify Pod has the resource limit](#verify-pod-has-the-resource-limit)
1515
- [Verify Pod has the pod-eni annotation](#verify-pod-has-the-pod-eni-annotation)
1616
- [Check Issues with VPC CNI](#check-issues-with-vpc-cni)
17+
- [Connection timeouts](#connection-timeouts)
18+
- [IP starvation issue](#ip-starvation-issue)
1719
- [Troubleshooting Prefix Delegation for Windows](#troubleshooting-prefix-delegation-for-windows)
1820
- [Verify Windows prefix delegation is enabled in the ConfigMap](#verify-windows-prefix-delegation-is-enabled-in-the-configmap)
1921
- [Check both pod events and node events for any specific error](#check-both-pod-events-and-node-events-for-any-specific-error)
@@ -272,6 +274,31 @@ If the Pod is still stuck in `ContainerCreating` you can,
272274
- Check the CNI Logs from the collected logs.
273275
- Open an [Issue](https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/new/choose) in this repository if the problem still persists.
274276

277+
### Connection Timeouts
278+
279+
If you observe connection failures like intermittent DNS timeouts on pods using security groups, you might need to update the branch ENI cooldown period or kernel ARP cache timeout so the **values are equal**. Else this could result in re-use of IP address of a recently terminated pod by a new pod before the kernel's ARP cache is updated, which causes DNS failures or general packet drops.
280+
281+
The branch ENI cooldown period is the period of time to wait before deleting the branch ENI for propagation of iptables rules for the deleted pod. This can be set on the `amazon-vpc-cni` configmap. See more details [here](../docs/sgp/sgp_config_options.md).
282+
283+
To update the kernel ARP cache timeout, set the following parameters for each existing interface on the node. If the branch ENI cooldown period is 30s, set:
284+
```
285+
sudo sysctl -w net.ipv4.neigh.eth0.gc_stale_time=30
286+
sudo sysctl -w net.ipv4.neigh.eth0.base_reachable_time_ms=15000
287+
```
288+
289+
Also set the default so all new interfaces created are configured with these values:
290+
```
291+
sudo sysctl -w net.ipv4.neigh.default.gc_stale_time=30
292+
sudo sysctl -w net.ipv4.neigh.default.base_reachable_time_ms=15000
293+
```
294+
295+
### IP starvation issue
296+
297+
If the pods are not `Running` due to IP addresses being unavailable, but you have few pods running and expect to have IP address available, tune the branch ENI cooldown period accordingly.
298+
The branch ENI cooldown period is the period of time to wait before deleting the branch ENI for propagation of iptables rules for the deleted pod. The default value is 60s, so IP addresses are not released for atleast 60s. This can be configured via the `amazon-vpc-cni` configmap as described [here](../docs/sgp/sgp_config_options.md). Note that the minimum cooldown period is 30s.
299+
300+
Be sure to also update the kernel ARP cache timeouts if you notice DNS issues as outlined in the [above section](#intermittent-dns-failures).
301+
275302
## Troubleshooting Prefix Delegation for Windows
276303
Please follow the troubleshooting steps here for issues with Windows Node and Pods when using `prefix delegation` mode.
277304

0 commit comments

Comments
 (0)