Support tolerateFailuresUntilDeadline for Helm deployments #9809

esabdull · 2025-05-05T12:51:32Z

Could you consider adding support for the tolerateFailuresUntilDeadline field for Helm deployments in Skaffold?

Context
In GKE Autopilot clusters, Helm deployments sometimes fail in Skaffold due to delays caused by node autoscaling. For example, if a node is deleted during a deployment, the associated pod needs to be recreated on a new node. This process can take some time.

Even though Kubernetes eventually recreates the pod and the deployment completes successfully, Skaffold may already report the deployment as failed.

Why this is needed
Currently, there’s no mechanism for Helm deployments in Skaffold to tolerate temporary scheduling issues. Supporting tolerateFailuresUntilDeadline for Helm—similar to what was introduced for Cloud Run in v2.16.0 — would allow Skaffold to wait before marking the deployment as failed.

This would make deployments in Autopilot environments more resilient and improve reliability in GitHub Actions and other CI/CD setups.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support tolerateFailuresUntilDeadline for Helm deployments #9809

Support tolerateFailuresUntilDeadline for Helm deployments #9809

esabdull commented May 5, 2025

Support tolerateFailuresUntilDeadline for Helm deployments #9809

Support tolerateFailuresUntilDeadline for Helm deployments #9809

Comments

esabdull commented May 5, 2025