Node Termination handler may still be necessary

The current README states this handler is deprecated in favor of the new Graceful Node Shutdown:
> ⚠️ Deprecation Notice
> As of Kubernetes 1.20, Graceful Node Shutdown replaces the need for GCP Node termination handler. GKE on versions 1.20+ enables Graceful Node Shutdown by default. Refer to the GKE documentation and Kubernetes documentation for more info about Graceful Node Shutdown (docs, blog post).

I have been using the Node Termination handler with GKE < 1.20, using pre-emptibles with GPUs.  The handler was needed to avoid a race condition on node restart that sometimes caused pods not to correctly recognize the GPU.
 
I have moved to GKE 1.21.1-gke.2200 and found the same error I would get with version <1.20 without the Node Termination handler.  This handler happens only occasionally, so it seems like potentially the same race condition.

> ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

I filed the following GKE issue.
https://issuetracker.google.com/issues/192809336

For the moment, I would ask that this repo not be deprecated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node Termination handler may still be necessary #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node Termination handler may still be necessary #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions