You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deleting a FIP triggers a reset of the associated EIP. Resetting an EIP sets status.ready: true even if the EIP is not yet ready. The EIP update handler then does not program the EIP on the NAT GW.
We hit this following a NAT GW being rescheduled to a new node, this triggers all FIPs and EIPs to be marked as not ready so they can be programmed on the new pod. A FIP was deleted and recreated while all this was going on but the associated EIP was never added to the new gateway pod.
Steps To Reproduce
It's a race condition so it can be difficult to consistently recreate this bug.
Create vpc nat gw
Create a lot of EIPs and FIPs on vpc nat gw
Delete vpc nat gw pod
Delete all FIPs
Exec onto vpc nat gw pod and check ip addresses on net1
Current Behavior
Deleting a FIP while an EIP is not yet programmed prevents the EIP ever being programmed.
Expected Behavior
Deleting a FIP while an EIP is not yet programmed has no impact on EIP being programmed.
The text was updated successfully, but these errors were encountered:
Deleting a FIP in Kube-OVN v.13.4 affects the ready state of an associated EIP because the EIP's status is updated based on the NAT rules that use it. When a FIP is deleted, the NAT rules associated with that FIP are also removed. This change is detected by the isOvnEipInUse function, which checks for the presence of NAT rules (DNAT, FIP, SNAT) using the EIP. If no NAT rules are found, the EIP's status is updated to reflect that it is no longer in use, which can prevent it from being programmed on the NAT Gateway [1][2].
The issue you're encountering seems to be related to a race condition where the EIP's status is incorrectly set to ready: true even when it is not yet ready, preventing it from being programmed on the NAT Gateway. This could be due to the timing of the FIP deletion and the NAT Gateway rescheduling. The system should ideally handle such race conditions more gracefully to ensure that EIPs are correctly programmed after such events.
Kube-OVN Version
v1.13.4
Kubernetes Version
v1.28.6
Operation-system/Kernel Version
"Ubuntu 22.04.5 LTS" 6.8.0-47-generic
Description
Deleting a FIP triggers a reset of the associated EIP. Resetting an EIP sets
status.ready: true
even if the EIP is not yet ready. The EIP update handler then does not program the EIP on the NAT GW.We hit this following a NAT GW being rescheduled to a new node, this triggers all FIPs and EIPs to be marked as not ready so they can be programmed on the new pod. A FIP was deleted and recreated while all this was going on but the associated EIP was never added to the new gateway pod.
Steps To Reproduce
It's a race condition so it can be difficult to consistently recreate this bug.
Current Behavior
Deleting a FIP while an EIP is not yet programmed prevents the EIP ever being programmed.
Expected Behavior
Deleting a FIP while an EIP is not yet programmed has no impact on EIP being programmed.
The text was updated successfully, but these errors were encountered: