Skip to content

Update docs to add amazon-vpc-cni configmap options for SGPP #359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ The controller supports the following modes for IPv4 address management on Windo

Please follow this [guide](https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html) for enabling Windows Support on your EKS cluster.

## Configuring the controller via amazon-vpc-cni configmap

The controller supports various configuration options for managing security groups for pods and Windows nodes which can be set via the EKS-managed configmap `amazon-vpc-cni`. For more details, refer to the security group for pods configuration options [here](docs/sgp/sgp_config_options.md) and Windows IPAM/PD related configuration options [here](docs/windows/prefix_delegation_config_options.md)

## Troubleshooting
For troubleshooting issues related to Security group for pods or Windows IPv4 address management, please visit our troubleshooting guide [here](docs/troubleshooting.md).

Expand Down
16 changes: 16 additions & 0 deletions docs/sgp/sgp_config_options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Configuration options for Security groups for pods

Users are able to configure the controller functionality related to security group for pods by updating the `data` fields in EKS-managed configmap `amazon-vpc-cni`.

* **branch-eni-cooldown**: Cooldown period for the branch ENIs, the period of time to wait before deleting the branch ENI for propagation of iptables rules for the deleted pod. The default cooldown period is 60s, and the minimum value for the cool period is 30s. If user updates configmap to a lower value than 30s, this will be overridden and set to 30s.

Add `branch-eni-cooldown` field in the configmap to set the cooldown period, example:
```
apiVersion: v1
data:
branch-eni-cooldown: "60"
kind: ConfigMap
metadata:
name: amazon-vpc-cni
namespace: kube-system
```
8 changes: 4 additions & 4 deletions docs/sgp/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ Security Group for Pods is supported only on Nitro Based Instances.

![New Nitro Based Node Create Event Diagram](../images/sgp-node-create.png)

1. User adds a new supported Node or enables ENI Trunking with existing nodes present in the cluster.
2. VPC CNI Plugin adds label `vpc.amazonaws.com/has-trunk-attached: false` if the Node has capacity to create 1 additional ENI.
3. Controller watches for Node events and acts on node with the above label by creating a Trunk ENI.
4. Controller updates the resource capacity on this node to `vpc.amazonaws.com/pod-eni: # Supported Branch ENI`.
1. User adds a new supported node or enables ENI Trunking with existing nodes present in the cluster.
2. VPC CNI Plugin updates EKS-managed CRD `CNINode <NODE-NAME>` to add feature `SecurityGroupsForPods` if the node has capacity to create 1 additional ENI.
3. Controller watches for node events and acts on node if the feature is added in `CNINode` CRD by creating a Trunk ENI.
4. Controller updates the resource capacity on this node to `vpc.amazonaws.com/pod-eni: # Supported Branch ENI`. Controller also publishes an event on the node upon successful trunk ENI creation.

## Creating a Pod using Security Groups

Expand Down
27 changes: 27 additions & 0 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
- [Verify Pod has the resource limit](#verify-pod-has-the-resource-limit)
- [Verify Pod has the pod-eni annotation](#verify-pod-has-the-pod-eni-annotation)
- [Check Issues with VPC CNI](#check-issues-with-vpc-cni)
- [Connection timeouts](#connection-timeouts)
- [IP starvation issue](#ip-starvation-issue)
- [Troubleshooting Prefix Delegation for Windows](#troubleshooting-prefix-delegation-for-windows)
- [Verify Windows prefix delegation is enabled in the ConfigMap](#verify-windows-prefix-delegation-is-enabled-in-the-configmap)
- [Check both pod events and node events for any specific error](#check-both-pod-events-and-node-events-for-any-specific-error)
Expand Down Expand Up @@ -272,6 +274,31 @@ If the Pod is still stuck in `ContainerCreating` you can,
- Check the CNI Logs from the collected logs.
- Open an [Issue](https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/new/choose) in this repository if the problem still persists.

### Connection Timeouts

If you observe connection failures like intermittent DNS timeouts on pods using security groups, you might need to update the branch ENI cooldown period or kernel ARP cache timeout so the **values are equal**. Else this could result in re-use of IP address of a recently terminated pod by a new pod before the kernel's ARP cache is updated, which causes DNS failures or general packet drops.

The branch ENI cooldown period is the period of time to wait before deleting the branch ENI for propagation of iptables rules for the deleted pod. This can be set on the `amazon-vpc-cni` configmap. See more details [here](../docs/sgp/sgp_config_options.md).

To update the kernel ARP cache timeout, set the following parameters for each existing interface on the node. If the branch ENI cooldown period is 30s, set:
```
sudo sysctl -w net.ipv4.neigh.eth0.gc_stale_time=30
sudo sysctl -w net.ipv4.neigh.eth0.base_reachable_time_ms=15000
```

Also set the default so all new interfaces created are configured with these values:
```
sudo sysctl -w net.ipv4.neigh.default.gc_stale_time=30
sudo sysctl -w net.ipv4.neigh.default.base_reachable_time_ms=15000
```

### IP starvation issue

If the pods are not `Running` due to IP addresses being unavailable, but you have few pods running and expect to have IP address available, tune the branch ENI cooldown period accordingly.
The branch ENI cooldown period is the period of time to wait before deleting the branch ENI for propagation of iptables rules for the deleted pod. The default value is 60s, so IP addresses are not released for atleast 60s. This can be configured via the `amazon-vpc-cni` configmap as described [here](../docs/sgp/sgp_config_options.md). Note that the minimum cooldown period is 30s.

Be sure to also update the kernel ARP cache timeouts if you notice DNS issues as outlined in the [above section](#intermittent-dns-failures).

## Troubleshooting Prefix Delegation for Windows
Please follow the troubleshooting steps here for issues with Windows Node and Pods when using `prefix delegation` mode.

Expand Down