-
Notifications
You must be signed in to change notification settings - Fork 0
Fix to prevent consecutive reboots in case of reboot delay. #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Signed-off-by: hlts2 <[email protected]>
Hey @hlts2 ,if we are storing the LastRebootCmdTime,i think we can remove LastTransitionTime |
Hi, thank you for your comments. I believe both For example, in the following case, handling it with just Condition:
K8s nodes can temporarily become What really matters is how long the node has been in the Additionally, after a reboot command is issued, the reboot might be delayed for various reasons. To prevent the same reboot command from being triggered again, it is important to track when the reboot command was last sent. This is where In summary, we use |
i got it,below i have listed the cases to make it more clear
|
Signed-off-by: hlts2 <[email protected]>
@jokestax Thank you for your review 🙇 I will merge this PR 🚀 |
Signed-off-by: hlts2 <[email protected]>
WHAT
This PR contains the following changes:
WHY
In this case, there was a scenario where the reboot was delayed for over 60 minutes. As a result, the node's LastTransitionTime remained unchanged, causing reboot commands to be issued every time. In other words, since the node's LastTransitionTime didn't change for 60 minutes, the node-agent's check loop continued to trigger, resulting in multiple reboot commands being sent during that time.
To address this, I considered it necessary to use both the node's LastTransitionTime and LastRebootCmdTime (the actual time the reboot command was sent) to handle such cases. This dual comparison ensures that reboot commands are not repeatedly triggered in the event of delays, effectively preventing unnecessary reboots.