-
Notifications
You must be signed in to change notification settings - Fork 107
Operator does not cleanly allow node drains to complete #1701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I found the problem. When HA is set to 3, the operator creates a PDB where |
@braunsonm Thanks for reporting the issue. Do you have any suggestion on how operator can change or improve to avoid this issue? |
@houshengbo I think the operator should set |
Is |
Yes and no. Max unavailable would be set for serving I think. But if you're configuring HA it would make sense that the operator creates a PDB so that HA is actually guaranteed. Otherwise you could still have an outage if the pods are evicted at the same time. I agree allowing overrides though like you currently do. |
This issue is stale because it has been open for 90 days with no |
/remove-lifecycle stale |
This issue is stale because it has been open for 90 days with no |
Still a problem |
This issue is stale because it has been open for 90 days with no |
/remove-lifecycle stale |
This issue is stale because it has been open for 90 days with no |
The following works for me, in a set up of 2 replicas HA. apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
...
spec:
podDisruptionBudgets:
- name: activator-pdb
minAvailable: 40%
- name: 3scale-kourier-gateway-pdb
minAvailable: 40%
- name: webhook-pdb
minAvailable: 40%
... |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
On AWS EKS, nodes are set to
SchedulingDisabled
and pods are evicted in batches (not cordoned). With knative serving deployed using the operator, some workloads will never drain when HA is set to 3.Expected behavior
The Knative Operator should allow these components to drain without user interaction.
To Reproduce
Knative release version
1.13.0
Additional context
I have enough nodes that the PDB shouldn't be violated.
The text was updated successfully, but these errors were encountered: