NTH queue handler behaviour broken by PR #940 #990

stevehipwell · 2024-04-16T12:44:31Z

Describe the bug
PR #940 removed the continue lifecycle action behaviour for ASGs which breaks the expected NTH behaviour; which is to signal to the ASG that the node can be deleted once it's drained.

If the PR author wanted to disable this functionality for their use case they should have provided configuration to do so, and only changed the default behaviour as part of a major version (if even wanted).

Steps to reproduce
Replace an ASG instance with instance refresh, the instance will not be terminated until the lifecycle hook times out.

Expected outcome
I'd expect the instance to be terminated by the ASG once NTH has completed draining it.

Application Logs
The log output when experiencing the issue.

Environment

NTH App Version: v1.21.0
NTH Mode (IMDS/Queue processor): Queue processor
OS/Arch: n/a
Kubernetes version: n/a
Installation method: Helm

stevehipwell · 2024-04-16T12:44:48Z

CC @cjerad @bwagner5

der-eismann · 2024-04-25T15:39:13Z

Can @GavinBurris42 say if this change was intended or a mistake?

stevehipwell · 2024-05-08T11:19:05Z

Could someone from @aws/ec2-guacamole please respond to this. This issue has a significant cost overhead where nodes which should be terminated continue to run for the whole grace period (as grace period is cluster scoped this has to be the max permitted grace period).

LikithaVemulapalli · 2024-05-08T16:55:51Z

Hello @stevehipwell, apologies for the late response and thank you for noticing the broken NTH behavior. This issue is considered as priority and will take measures to mitigate this ASAP. Thank you.

stevehipwell · 2024-05-09T14:09:51Z

@LikithaVemulapalli I've opened PR #999 to solve this issue.

It looks like the problem the original PR author was attempting to solve could be handled better by setting the CompleteLifecycleActionDelaySeconds value, which would add a delay to all terminations but is user controlled rather than breaking NTH for everyone using it for other patterns (such as instance refresh).

LikithaVemulapalli · 2024-05-09T14:59:02Z

Hello @stevehipwell, thanks for opening a PR to resolve this issue, yes our team had a discussion and identified the root cause with the post drain task that was modified as a part of latest changes. Appreciate the effort, will review the PR that you opened. Thank you.

stevehipwell · 2024-05-14T11:33:53Z

@LikithaVemulapalli the build has failed so the Helm chart hasn't been released for the NTH release.

LikithaVemulapalli · 2024-05-14T17:00:35Z

Hello @stevehipwell, yes the build failed adding windows binaries to assets, currently taking a look at that, we are aware of this issue, we are trying to fix it, thank you.

stevehipwell mentioned this issue May 9, 2024

fix: Reverted the removal of the lifecycle hook completion #999

Merged

LikithaVemulapalli self-assigned this May 9, 2024

LikithaVemulapalli closed this as completed in #999 May 10, 2024

LikithaVemulapalli added Pending-Release Pending an NTH or eks-charts release and removed Pending-Release Pending an NTH or eks-charts release labels May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NTH queue handler behaviour broken by PR #940 #990

NTH queue handler behaviour broken by PR #940 #990

stevehipwell commented Apr 16, 2024

stevehipwell commented Apr 16, 2024

der-eismann commented Apr 25, 2024

stevehipwell commented May 8, 2024

LikithaVemulapalli commented May 8, 2024

stevehipwell commented May 9, 2024

LikithaVemulapalli commented May 9, 2024

stevehipwell commented May 14, 2024

LikithaVemulapalli commented May 14, 2024

NTH queue handler behaviour broken by PR #940 #990

NTH queue handler behaviour broken by PR #940 #990

Comments

stevehipwell commented Apr 16, 2024

stevehipwell commented Apr 16, 2024

der-eismann commented Apr 25, 2024

stevehipwell commented May 8, 2024

LikithaVemulapalli commented May 8, 2024

stevehipwell commented May 9, 2024

LikithaVemulapalli commented May 9, 2024

stevehipwell commented May 14, 2024

LikithaVemulapalli commented May 14, 2024