Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Subnet removal and IPAM allocation issues using helm charts #5064

Open
tz1112 opened this issue Mar 9, 2025 · 2 comments
Open

[BUG] Subnet removal and IPAM allocation issues using helm charts #5064

tz1112 opened this issue Mar 9, 2025 · 2 comments
Labels
bug Something isn't working chart Helm Chart ipam subnet

Comments

@tz1112
Copy link

tz1112 commented Mar 9, 2025

Kube-OVN Version

v1.13.2

Kubernetes Version

v1.32.2

Operation-system/Kernel Version

6.11.0-17-generic

Description

Installing kube-ovn environments using helm charts causes issues in the context of subnet/pod deletion and pod IP allocation.

Installing and then uninstalling a kube-ovn environment using helm charts results in orphaned subnet resources (might be related to #4898). Subnets can only be deleted manually by removing the corresponding finalizers.

Additionally, dynamic IP allocation by deploying pods in a predefined namespace causes the PostStartHook to fail, as my PostStartHook modifies the routing table. The pod acquires an IP address of the ovn-default namespace, and not the one defined in the subnet-namespace it was deployed in, causing the route table modification to fail. Interestingly, if I assign the pod a static ip address within it's namespace IP pool, it works correctly. Furthermore, if I manually deploy the NetworkAttachmentDefinitions and subnets first using kubectl (instead of a combined helm chart), everything works correctly (static as well as dynamic IP allocation). If I then manually remove all pods first and then manually remove the subnets, no orphaned subnets remain.
As far as I know, helm collects all kinds defined in a helm chart and deploys them in a predefined, static sequence. This leads me to the suspicion that both issues described here are both symptoms of the same underlying problem: the CRDs might not be deployed in the expected sequence when using helm.

Steps To Reproduce

subnet.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: backend

---

apiVersion: v1
kind: Namespace
metadata:
  name: r1pool
 
---

apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: r1pool
spec:
  protocol: IPv4
  provider: r1pool.backend.ovn
  cidrBlock: 10.1.0.0/16
  # gateway: 10.1.0.1
  excludeIps:
  - 10.1.0.0..10.1.0.10
  namespaces:
  - r1pool

---

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: r1pool
  namespace: backend
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "kube-ovn",
      "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
      "provider": "r1pool.backend.ovn"
    }'   

---

host.yaml

# r1pool

apiVersion: apps/v1
kind: Deployment
metadata:
  name: r1pool
  namespace: r1pool
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ippool    
  template:
    metadata:
      labels:
        app: ippool
      # If explicit IPs are assigned, everything works as expected using helm
      # annotations:
      #   ovn.kubernetes.io/ip_pool: 10.1.1.100
    spec:
      containers:
        - name: r1pool
          image: docker.io/library/nginx:alpine
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true  # Required for networking
            capabilities:
              add: ["NET_RAW", "NET_ADMIN"]
          lifecycle:
            postStart:
              exec:
                command: ["/bin/sh", "-c", "ip route del default && ip route add default via 10.1.1.254 dev eth0"]

--- 

Current Behavior

Given the following two manifests, everything works fine if they are deployed manually using kubectl. If hosts.yaml is removed first and subnet.yaml subsequently, everything works as expected.
If both are combined into a helm chart and the helm chart is installed, the pod receives an IP from the ovn-default subnet, not the subnet specified by the actual namespace r1pool. This causes the PostStartHook to fail. If an explicit IP is specified, the correct IP is acquired and the PostStartHook succeeds.
Uninstalling the helm chart causes the subnet deletion to fail, subnets have to be removed manually.

Expected Behavior

Pods acquire correct IP from the specified subnet without the need to specifiy an explicit IP address.

@tz1112 tz1112 added the bug Something isn't working label Mar 9, 2025
@dosubot dosubot bot added chart Helm Chart ipam subnet labels Mar 9, 2025
Copy link

dosubot bot commented Mar 9, 2025

Hey @tz1112! I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together!

I found a similar closed issue that might be relevant to your problem: ip资源未被回收,子网ip占用残留. This issue was resolved by a pull request that addressed the deletion of IP resources based on the pod name [1].

To resolve subnet removal and IPAM allocation issues with Kube-OVN, particularly regarding orphaned subnet resources and incorrect IP allocation in the PostStartHook, you can follow these steps:

  1. Ensure Subnet Validity: Validate the CIDR when adding or updating a subnet to prevent invalid configurations that can cause orphaned resources.

  2. Handle IP Conflicts: Check for conflicts with existing IPs when assigning static IPs to prevent incorrect IP allocation.

  3. Release Addresses Properly: Ensure IP addresses are released correctly when pods are deleted or updated to prevent orphaned resources.

  4. Update Excluded IPs: When updating a subnet's excluded IPs, ensure released addresses are not reused if they fall within the new exclusion range.

  5. Check for No Available IPs: Handle scenarios where no available IPs are left due to exclusions or full allocation, and return appropriate errors.

  6. Test Subnet Operations: Use test cases to ensure subnet operations such as adding, updating, and removing IP pools are functioning correctly.

  7. Use IPAM Methods Correctly: Leverage IPAM methods like AddOrUpdateSubnet, DeleteSubnet, and ReleaseAddressByPod to manage subnets and IP allocations effectively.

  8. Manage IP Pools: Use methods like AddOrUpdateIPPool and RemoveIPPool to manage IP pools within a subnet, ensuring IPs are correctly allocated and released [2][3][4][5].

Additionally, the issue with the PostStartHook failing due to incorrect IP allocation might be related to the sequence of resource deployment when using Helm charts. Helm may not deploy CRDs in the expected sequence, leading to these issues. Manually deploying NetworkAttachmentDefinitions and subnets first, as you've noted, seems to resolve the problem, indicating a potential sequence issue with Helm.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@oilbeater
Copy link
Collaborator

Your guess is correct. The result will be affected by resources order. When a Pod is created, if the subnet is not ready, it will allocate an IP address first from the default subnet. I think this cannot be resolved on the kube-ovn side, as the order seen by the kube-ovn-controller is not deterministic. The controller does not know if there will be a subsequent subnet or namespace change when the pod is created. Therefore, it's better to ensure the order by using Helm hooks.

It seems like a bug related to the deletion, but I can't reproduce it in my environment. Could you please attach the kube-ovn-controller log to help debug the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chart Helm Chart ipam subnet
Projects
None yet
Development

No branches or pull requests

2 participants