Node Termination handler may still be necessary #43
Description
The current README states this handler is deprecated in favor of the new Graceful Node Shutdown:
⚠️ Deprecation Notice
As of Kubernetes 1.20, Graceful Node Shutdown replaces the need for GCP Node termination handler. GKE on versions 1.20+ enables Graceful Node Shutdown by default. Refer to the GKE documentation and Kubernetes documentation for more info about Graceful Node Shutdown (docs, blog post).
I have been using the Node Termination handler with GKE < 1.20, using pre-emptibles with GPUs. The handler was needed to avoid a race condition on node restart that sometimes caused pods not to correctly recognize the GPU.
I have moved to GKE 1.21.1-gke.2200 and found the same error I would get with version <1.20 without the Node Termination handler. This handler happens only occasionally, so it seems like potentially the same race condition.
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
I filed the following GKE issue.
https://issuetracker.google.com/issues/192809336
For the moment, I would ask that this repo not be deprecated.