diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES
index eac30035171..061142f13b7 100644
--- a/OWNERS_ALIASES
+++ b/OWNERS_ALIASES
@@ -142,6 +142,11 @@ aliases:
- jeremyrickard
- liggitt
- micahhausler
+ wg-node-lifecycle-leads:
+ - atiratree
+ - fabriziopandini
+ - humblec
+ - rthallisey
wg-policy-leads:
- JimBugwadia
- poonam-lamba
diff --git a/communication/slack-config/channels.yaml b/communication/slack-config/channels.yaml
index be7dfe88d44..805b9fbd123 100644
--- a/communication/slack-config/channels.yaml
+++ b/communication/slack-config/channels.yaml
@@ -584,6 +584,7 @@ channels:
- name: wg-multitenancy
- name: wg-naming
archived: true
+ - name: wg-node-lifecycle
- name: wg-onprem
archived: true
- name: wg-policy
diff --git a/liaisons.md b/liaisons.md
index 42a3c54f5b6..f43a7a205b5 100644
--- a/liaisons.md
+++ b/liaisons.md
@@ -59,6 +59,7 @@ members will assume one of the departing members groups.
| [WG Device Management](wg-device-management/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
| [WG etcd Operator](wg-etcd-operator/README.md) | Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) |
| [WG LTS](wg-lts/README.md) | Sascha Grunert (**[@saschagrunert](https://github.com/saschagrunert)**) |
+| [WG Node Lifecycle](wg-node-lifecycle/README.md) | TBD (**[@TBD](https://github.com/TBD)**) |
| [WG Policy](wg-policy/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
| [WG Serving](wg-serving/README.md) | Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) |
| [WG Structured Logging](wg-structured-logging/README.md) | Sascha Grunert (**[@saschagrunert](https://github.com/saschagrunert)**) |
diff --git a/sig-apps/README.md b/sig-apps/README.md
index ba5e073d7b3..fa2e645ea70 100644
--- a/sig-apps/README.md
+++ b/sig-apps/README.md
@@ -59,6 +59,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-apps:
* [WG Batch](/wg-batch)
* [WG Data Protection](/wg-data-protection)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Serving](/wg-serving)
diff --git a/sig-architecture/README.md b/sig-architecture/README.md
index 2013d2b772d..649a7128794 100644
--- a/sig-architecture/README.md
+++ b/sig-architecture/README.md
@@ -58,6 +58,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
The following [working groups][working-group-definition] are sponsored by sig-architecture:
* [WG Device Management](/wg-device-management)
* [WG LTS](/wg-lts)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Policy](/wg-policy)
* [WG Serving](/wg-serving)
* [WG Structured Logging](/wg-structured-logging)
diff --git a/sig-autoscaling/README.md b/sig-autoscaling/README.md
index 79d95480628..6c5a132ded9 100644
--- a/sig-autoscaling/README.md
+++ b/sig-autoscaling/README.md
@@ -48,6 +48,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
The following [working groups][working-group-definition] are sponsored by sig-autoscaling:
* [WG Batch](/wg-batch)
* [WG Device Management](/wg-device-management)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Serving](/wg-serving)
diff --git a/sig-cli/README.md b/sig-cli/README.md
index f28cac88771..3fe661cb7ea 100644
--- a/sig-cli/README.md
+++ b/sig-cli/README.md
@@ -60,6 +60,12 @@ subprojects, and resolve cross-subproject technical issues and decisions.
- [@kubernetes/sig-cli-test-failures](https://github.com/orgs/kubernetes/teams/sig-cli-test-failures) - Test Failures and Triage
- Steering Committee Liaison: Paco Xu 徐俊杰 (**[@pacoxu](https://github.com/pacoxu)**)
+## Working Groups
+
+The following [working groups][working-group-definition] are sponsored by sig-cli:
+* [WG Node Lifecycle](/wg-node-lifecycle)
+
+
## Subprojects
The following [subprojects][subproject-definition] are owned by sig-cli:
diff --git a/sig-cloud-provider/README.md b/sig-cloud-provider/README.md
index dceabfd51ae..d694a1abce4 100644
--- a/sig-cloud-provider/README.md
+++ b/sig-cloud-provider/README.md
@@ -58,6 +58,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
## Working Groups
The following [working groups][working-group-definition] are sponsored by sig-cloud-provider:
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Structured Logging](/wg-structured-logging)
diff --git a/sig-cluster-lifecycle/README.md b/sig-cluster-lifecycle/README.md
index afc4e9a431f..aeb59b569cb 100644
--- a/sig-cluster-lifecycle/README.md
+++ b/sig-cluster-lifecycle/README.md
@@ -52,6 +52,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-cluster-lifecycle:
* [WG LTS](/wg-lts)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG etcd Operator](/wg-etcd-operator)
diff --git a/sig-list.md b/sig-list.md
index a45672f9536..fb47bd3e591 100644
--- a/sig-list.md
+++ b/sig-list.md
@@ -66,6 +66,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
|[Device Management](wg-device-management/README.md)|[device-management](https://github.com/kubernetes/kubernetes/labels/wg%2Fdevice-management)|* Architecture
* Autoscaling
* Network
* Node
* Scheduling
|* [John Belamaric](https://github.com/johnbelamaric), Google
* [Kevin Klues](https://github.com/klueska), NVIDIA
* [Patrick Ohly](https://github.com/pohly), Intel
|* [Slack](https://kubernetes.slack.com/messages/wg-device-management)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-device-management)|* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time) (biweekly)](TBD)
|[etcd Operator](wg-etcd-operator/README.md)|[etcd-operator](https://github.com/kubernetes/kubernetes/labels/wg%2Fetcd-operator)|* Cluster Lifecycle
* etcd
|* [Benjamin Wang](https://github.com/ahrtr), VMware
* [Ciprian Hacman](https://github.com/hakman), Microsoft
* [Josh Berkus](https://github.com/jberkus), Red Hat
* [James Blair](https://github.com/jmhbnz), Red Hat
* [Justin Santa Barbara](https://github.com/justinsb), Google
|* [Slack](https://kubernetes.slack.com/messages/wg-etcd-operator)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-etcd-operator)|* Regular WG Meeting: [Tuesdays at 11:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/my/cncfetcdproject)
|[LTS](wg-lts/README.md)|[lts](https://github.com/kubernetes/kubernetes/labels/wg%2Flts)|* Architecture
* Cluster Lifecycle
* K8s Infra
* Release
* Security
* Testing
|* [Jeremy Rickard](https://github.com/jeremyrickard), Microsoft
* [Jordan Liggitt](https://github.com/liggitt), Google
* [Micah Hausler](https://github.com/micahhausler), Amazon
|* [Slack](https://kubernetes.slack.com/messages/wg-lts)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-lts)|* Regular WG Meeting: [Tuesdays at 07:00 PT (Pacific Time) (biweekly)](https://zoom.us/j/92480197536?pwd=dmtSMGJRQmNYYTIyZkFlQ25JRngrdz09)
+|[Node Lifecycle](wg-node-lifecycle/README.md)|[node-lifecycle](https://github.com/kubernetes/kubernetes/labels/wg%2Fnode-lifecycle)|* Apps
* Architecture
* Autoscaling
* CLI
* Cloud Provider
* Cluster Lifecycle
* Network
* Node
* Scheduling
* Storage
|* [Filip Křepinský](https://github.com/atiratree), Red Hat
* [Fabrizio Pandini](https://github.com/fabriziopandini), VMware
* [Humble Chirammal](https://github.com/humblec), VMware
* [Ryan Hallisey](https://github.com/rthallisey), NVIDIA
|* [Slack](https://kubernetes.slack.com/messages/wg-node-lifecycle)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-node-lifecycle)|* WG Node Lifecycle Weekly Meeting: [TBDs at TBD TBD (weekly)]()
|[Policy](wg-policy/README.md)|[policy](https://github.com/kubernetes/kubernetes/labels/wg%2Fpolicy)|* Architecture
* Auth
* Multicluster
* Network
* Node
* Scheduling
* Storage
|* [Jim Bugwadia](https://github.com/JimBugwadia), Kyverno/Nirmata
* [Poonam Lamba](https://github.com/poonam-lamba), Google
* [Andy Suderman](https://github.com/sudermanjr), Fairwinds
|* [Slack](https://kubernetes.slack.com/messages/wg-policy)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-policy)|* Regular WG Meeting: [Wednesdays at 8:00 PT (Pacific Time) (semimonthly)](https://zoom.us/j/7375677271)
|[Serving](wg-serving/README.md)|[serving](https://github.com/kubernetes/kubernetes/labels/wg%2Fserving)|* Apps
* Architecture
* Autoscaling
* Instrumentation
* Network
* Node
* Scheduling
* Storage
|* [Eduardo Arango](https://github.com/ArangoGutierrez), NVIDIA
* [Jiaxin Shan](https://github.com/Jeffwan), Bytedance
* [Sergey Kanzhelev](https://github.com/SergeyKanzhelev), Google
* [Yuan Tang](https://github.com/terrytangyuan), Red Hat
|* [Slack](https://kubernetes.slack.com/messages/wg-serving)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-serving)|* WG Serving Weekly Meeting ([calendar](https://calendar.google.com/calendar/embed?src=e896b769743f3877edfab2d4c6a14132b2aa53287021e9bbf113cab676da54ba%40group.calendar.google.com)): [Wednesdays at 9:00 PT (Pacific Time) (weekly)](https://zoom.us/j/92615874244?pwd=VGhxZlJjRTNRWTZIS0dQV2MrZUJ5dz09)
|[Structured Logging](wg-structured-logging/README.md)|[structured-logging](https://github.com/kubernetes/kubernetes/labels/wg%2Fstructured-logging)|* API Machinery
* Architecture
* Cloud Provider
* Instrumentation
* Network
* Node
* Scheduling
* Storage
|* [Mengjiao Liu](https://github.com/mengjiao-liu), Independent
* [Patrick Ohly](https://github.com/pohly), Intel
|* [Slack](https://kubernetes.slack.com/messages/wg-structured-logging)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-structured-logging)|
diff --git a/sig-network/README.md b/sig-network/README.md
index 494bc7a0866..09c4ea1c830 100644
--- a/sig-network/README.md
+++ b/sig-network/README.md
@@ -70,6 +70,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-network:
* [WG Device Management](/wg-device-management)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Policy](/wg-policy)
* [WG Serving](/wg-serving)
* [WG Structured Logging](/wg-structured-logging)
diff --git a/sig-node/README.md b/sig-node/README.md
index fdc411e48e9..3c5e7539833 100644
--- a/sig-node/README.md
+++ b/sig-node/README.md
@@ -55,6 +55,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-node:
* [WG Batch](/wg-batch)
* [WG Device Management](/wg-device-management)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Policy](/wg-policy)
* [WG Serving](/wg-serving)
* [WG Structured Logging](/wg-structured-logging)
diff --git a/sig-scheduling/README.md b/sig-scheduling/README.md
index b760a57182f..d667b17df1f 100644
--- a/sig-scheduling/README.md
+++ b/sig-scheduling/README.md
@@ -67,6 +67,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-scheduling:
* [WG Batch](/wg-batch)
* [WG Device Management](/wg-device-management)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Policy](/wg-policy)
* [WG Serving](/wg-serving)
* [WG Structured Logging](/wg-structured-logging)
diff --git a/sig-storage/README.md b/sig-storage/README.md
index 9847e62f299..ba854e7b8e2 100644
--- a/sig-storage/README.md
+++ b/sig-storage/README.md
@@ -59,6 +59,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-storage:
* [WG Data Protection](/wg-data-protection)
+* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Policy](/wg-policy)
* [WG Serving](/wg-serving)
* [WG Structured Logging](/wg-structured-logging)
diff --git a/sigs.yaml b/sigs.yaml
index 8417ef70d90..7de0bb5c5a9 100644
--- a/sigs.yaml
+++ b/sigs.yaml
@@ -3697,6 +3697,58 @@ workinggroups:
liaison:
github: saschagrunert
name: Sascha Grunert
+- dir: wg-node-lifecycle
+ name: Node Lifecycle
+ mission_statement: >
+ Explore and improve node and pod lifecycle in Kubernetes. This should result in
+ better node drain/maintenance support and better pod disruption/termination. It
+ should also improve node and pod autoscaling, better application migration and
+ availability, load balancing, de/scheduling, node shutdown, cloud provider integrations,
+ and support other new scenarios and integrations.
+
+ charter_link: charter.md
+ stakeholder_sigs:
+ - Apps
+ - Architecture
+ - Autoscaling
+ - CLI
+ - Cloud Provider
+ - Cluster Lifecycle
+ - Network
+ - Node
+ - Scheduling
+ - Storage
+ label: node-lifecycle
+ leadership:
+ chairs:
+ - github: atiratree
+ name: Filip Křepinský
+ company: Red Hat
+ email: atiratree@gmail.com
+ - github: fabriziopandini
+ name: Fabrizio Pandini
+ company: VMware
+ email: fabrizio.pandini@gmail.com
+ - github: humblec
+ name: Humble Chirammal
+ company: VMware
+ email: humble.devassy@gmail.com
+ - github: rthallisey
+ name: Ryan Hallisey
+ company: NVIDIA
+ email: rhallisey@nvidia.com
+ meetings:
+ - description: WG Node Lifecycle Weekly Meeting
+ day: TBD
+ time: TBD
+ tz: TBD
+ frequency: weekly
+ contact:
+ slack: wg-node-lifecycle
+ mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-node-lifecycle
+ liaison:
+ github: TBD
+ name: TBD
- dir: wg-policy
name: Policy
mission_statement: >
diff --git a/wg-node-lifecycle/OWNERS b/wg-node-lifecycle/OWNERS
new file mode 100644
index 00000000000..1a6563e77fe
--- /dev/null
+++ b/wg-node-lifecycle/OWNERS
@@ -0,0 +1,8 @@
+# See the OWNERS docs at https://go.k8s.io/owners
+
+reviewers:
+ - wg-node-lifecycle-leads
+approvers:
+ - wg-node-lifecycle-leads
+labels:
+ - wg/node-lifecycle
diff --git a/wg-node-lifecycle/README.md b/wg-node-lifecycle/README.md
new file mode 100644
index 00000000000..919d42269d0
--- /dev/null
+++ b/wg-node-lifecycle/README.md
@@ -0,0 +1,45 @@
+
+# Node Lifecycle Working Group
+
+Explore and improve node and pod lifecycle in Kubernetes. This should result in better node drain/maintenance support and better pod disruption/termination. It should also improve node and pod autoscaling, better application migration and availability, load balancing, de/scheduling, node shutdown, cloud provider integrations, and support other new scenarios and integrations.
+
+The [charter](charter.md) defines the scope and governance of the Node Lifecycle Working Group.
+
+## Stakeholder SIGs
+* [SIG Apps](/sig-apps)
+* [SIG Architecture](/sig-architecture)
+* [SIG Autoscaling](/sig-autoscaling)
+* [SIG CLI](/sig-cli)
+* [SIG Cloud Provider](/sig-cloud-provider)
+* [SIG Cluster Lifecycle](/sig-cluster-lifecycle)
+* [SIG Network](/sig-network)
+* [SIG Node](/sig-node)
+* [SIG Scheduling](/sig-scheduling)
+* [SIG Storage](/sig-storage)
+
+## Meetings
+*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-node-lifecycle) for the group will typically add invites for the following meetings to your calendar.*
+* WG Node Lifecycle Weekly Meeting: [TBDs at TBD TBD]() (weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=TBD&tz=TBD).
+
+## Organizers
+
+* Filip Křepinský (**[@atiratree](https://github.com/atiratree)**), Red Hat
+* Fabrizio Pandini (**[@fabriziopandini](https://github.com/fabriziopandini)**), VMware
+* Humble Chirammal (**[@humblec](https://github.com/humblec)**), VMware
+* Ryan Hallisey (**[@rthallisey](https://github.com/rthallisey)**), NVIDIA
+
+## Contact
+- Slack: [#wg-node-lifecycle](https://kubernetes.slack.com/messages/wg-node-lifecycle)
+- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-node-lifecycle)
+- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fnode-lifecycle)
+- Steering Committee Liaison: TBD (**[@TBD](https://github.com/TBD)**)
+
+
+
diff --git a/wg-node-lifecycle/charter.md b/wg-node-lifecycle/charter.md
new file mode 100644
index 00000000000..90671be8361
--- /dev/null
+++ b/wg-node-lifecycle/charter.md
@@ -0,0 +1,163 @@
+# WG Node Lifecycle Charter
+
+This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
+the Roles and Organization Management outlined in [wg-governance].
+
+[Kubernetes Charter README]: /committee-steering/governance/README.md
+
+## Scope
+
+The Kubernetes ecosystem currently faces challenges in node maintenance scenarios, with multiple
+projects independently addressing similar issues. The goal of this working group is to develop
+unified APIs that the entire ecosystem can depend on, reducing the maintenance burden across
+projects and addressing scenarios that impede node drain or cause improper pod termination. Our
+objective is to create easily configurable, out-of-the-box solutions that seamlessly integrate with
+existing APIs and behaviors. We will strive to make these solutions minimalistic and extensible to
+support advanced use cases across the ecosystem.
+
+To properly solve the node drain, we must first understand the node lifecycle. This includes
+provisioning/sunsetting of the nodes, PodDisruptionBudgets, API-initiated eviction and node
+shutdown. This then impacts both the node and pod autoscaling, de/scheduling, load balancing, and
+the applications running in the cluster. All of these areas have issues and would benefit from a
+unified approach.
+
+### In scope
+
+- Explore a unified way of draining the nodes and managing node maintenance by introducing new APIs
+ and extending the current ones. This includes exploring extension to or interactions with the Node
+ object.
+- Analyze the node lifecycle, the Node API, and possible interactions. We want to explore augmenting
+ the Node API to expose additional state or status in order to coalesce other core Kubernetes and
+ community APIs around node lifecycle management.
+- Improve the disruption model that is currently implemented by API-initiated Eviction API and PDBs.
+ Improve the descheduling, availability and migration capabilities of today's application
+ workloads. Also explore the interactions with other eviction mechanisms.
+- Coordinate pod termination and issues around de/scheduling, preemption and eviction.
+- Improve the Graceful/Non-Graceful Node Shutdown and consider how this affects the node lifecycle.
+ To graduate the [Graceful Node Shutdown](https://github.com/kubernetes/enhancements/issues/2000)
+ feature to GA and resolve the associated node shutdown issues.
+- Improve the scheduling and pod/node autoscaling to take into account ongoing node maintenance and
+ the new disruption model/evictions. This includes balancing of the pods according to scheduling
+ constraints.
+- Consider improving the pod lifecycle of DaemonSets and Static pods during a node maintenance.
+- Explore the cloud provider use cases and how they can hook in into the node lifecycle. So that the
+ users can use the same APIs or configurations across the board.
+- Migrate users of the eviction based kubectl-like drain (kubectl, cluster autoscaler, karpenter,
+ ...) and other scenarios to use the new unified node draining approach.
+- Explore possible scenarios behind the reason why the node was terminated/drained/killed and how to
+ track and react to each of them. Consider past discussions/historical perspective
+ (e.g. "thumbstones").
+
+### Out of scope
+
+- Implementing cloud provider specific logic, the goal is to have high-level API that the providers
+ can use, hook into, or extend.
+- Infrastructure provisioning, deprovisioning solution or physical infrastructure lifecycle
+ management solution.
+
+## Stakeholders
+
+- SIG Apps
+- SIG Architecture
+- SIG Autoscaling
+- SIG CLI
+- SIG Cloud Provider
+- SIG Cluster Lifecycle
+- SIG Network
+- SIG Node
+- SIG Scheduling
+- SIG Storage
+
+Stakeholders span from multiple SIGs to a broad set of end users,
+public and private cloud providers, Kubernetes distribution providers,
+and cloud provider end-users. Here are some user stories:
+
+- As a cluster admin I want to have a simple interface to initiate a node drain/maintenance without
+ any required manual interventions. I also want to be able to observe the node drain via the API
+ and check on its progress. I also want to be able to discover workloads that are blocking the node
+ drain.
+- To support the new features, node maintenance, scheduler, descheduler, pod autoscaling, kubelet,
+ and other actors want to use a new eviction API to gracefully remove pods. This would enable new
+ migration strategies that prefer to surge (upscale) pods first rather than downscale them. It
+ would also allow other users/components to monitor pods that are gracefully removed/terminated
+ and provide better behaviour in terms of de/scheduling, scaling and availability.
+- As a cluster admin, I want to be able to perform arbitrary actions after the node drain is
+ complete, such as resetting GPU drivers, resetting NICs, performing software updates or shutting
+ down the machine.
+- As an end user, I would like more alternatives to blue-green upgrades, especially with special
+ hardware accelerators; it's far too expensive. I would like to choose a strategy on how to
+ coordinate the node drain and the upgrade to achieve better cost-effectiveness.
+- As a cloud provider, I need to perform regular maintenance on the hardware in my fleet. Enhancing
+ Kubernetes to help CSPs safely remove hardware will reduce operational costs.
+- Modelling the cost of doing accelerator maintenance in today's world can be massive. And since
+ hardware accelerators tend to need more love and care, having software support to coordinate
+ maintenance will reduce operational costs.
+- As a cluster admin, I would like to use a mixture of on-demand and temporary spot instances in my
+ clusters to reduce cloud expenditure. Having more reliable lifecycle and drain mechanisms for
+ nodes will improve cluster stability in scenarios where instances may be terminated by the cloud
+ provider due to cost-related thresholds.
+- As a user, I want to prevent any disruption to my pet or expensive workloads (VMs, ML with
+ accelerators) and either prevent termination altogether or have a reliable migration path.
+ Features like `terminationGracePeriodSeconds` are not sufficient as the termination/migration can
+ take hours if not days.
+- As a user, I want my application to finish all network and storage operations before terminating a
+ pod. This includes closing pod connections, removing pods from endpoints, writing cached writes
+ to the underlying storage and completing storage cleanup routines.
+
+## Deliverables
+
+The WG will coordinate requirement gathering and design, eventually leading to
+KEP(s)s and code associated with the ideas.
+
+Area we expect to explore:
+
+- An API to express node drain/maintenance.
+ Currently tracked in https://github.com/kubernetes/enhancements/issues/4212.
+- An API to solve the problems wrt the API-initiated Eviction API and PDBs.
+ Currently tracked in https://github.com/kubernetes/enhancements/issues/4563.
+- An API/mechanism to gracefully terminate pods during a node shutdown.
+ Graceful node shutdown feature tracked in https://github.com/kubernetes/enhancements/issues/2000.
+- An API to deschedule pods that use DRA devices.
+ DRA: device taints and tolerations feature tracked in https://github.com/kubernetes/enhancements/issues/5055.
+- An API to remove pods from endpoints before they terminate.
+ Currently tracked in https://docs.google.com/document/d/1t25jgO_-LRHhjRXf4KJ5xY_t8BZYdapv7MDAxVGY6R8/edit?tab=t.0#heading=h.i4lwa7rdng7y.
+- Introduce enhancements across multiple Kubernetes SIGs to add support for the new APIs to solve
+ wide range of issue.
+
+We expect to provide reference implementations of the new APIs including but not limited to
+controllers, API validation, integration with existing core components and extension points for the
+ecosystem. This should be accompanied by E2E / Conformance tests.
+
+## Relevant Projects
+
+This is a list of known projects that solve similar problems in the ecosystem or would benefit from
+the efforts of this WG:
+
+- https://github.com/aws/aws-node-termination-handler
+- https://github.com/foriequal0/pod-graceful-drain
+- https://github.com/jukie/karpenter-deprovision-controller
+- https://github.com/kubereboot/kured
+- https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
+- https://github.com/kubernetes-sigs/karpenter
+- https://github.com/kubevirt/kubevirt
+- https://github.com/medik8s/node-maintenance-operator
+- https://github.com/Mellanox/maintenance-operator
+- https://github.com/openshift/machine-config-operator
+- https://github.com/planetlabs/draino
+- https://github.com/strimzi/drain-cleaner
+
+There are also internal custom solutions that companies use.
+
+## Roles and Organization Management
+
+This WG adheres to the Roles and Organization Management outlined in [wg-governance]
+and opts-in to updates and modifications to [wg-governance].
+
+[wg-governance]: /committee-steering/governance/wg-governance.md
+
+## Timelines and Disbanding
+
+The working group will disband once the features and core APIs defined in the KEPs have reached a
+stable state (GA) and ongoing maintenance ownership is established within the relevant SIGs. We will
+review whether the working group should disband if appropriate SIG ownership
+can't be reached.