-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutator: default evictionStrategy to None on ARM64 clusters #3380
mutator: default evictionStrategy to None on ARM64 clusters #3380
Conversation
if *hc.Status.InfrastructureHighlyAvailable { | ||
value = kubevirtcorev1.EvictionStrategyLiveMigrate | ||
workerNodes := &corev1.NodeList{} | ||
err := cli.List(ctx, workerNodes, client.MatchingLabels{"node-role.kubernetes.io/worker": ""}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this label is enough. I case the workloads node-placement is set, I think we will need to use it instead.
@orenc1 - WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some inline comments. you can ignore the nit one if you find it not heelping.
as a general comment, I would prefer to reuse the node controller, but this will probably require some API change in the status field, and we'll need to do that anyway to support heterogeneous cluster in the future, and this is not designed yet. so for now we'll have to do that this way.
ccf1cbf
to
952006e
Compare
I also Adjusted the unit tests |
952006e
to
5488e1c
Compare
hco-e2e-operator-sdk-sno-azure lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure, ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-sno-aws, ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hco-e2e-upgrade-operator-sdk-azure lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-aws In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hco-e2e-kv-smoke-gcp lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hco-e2e-kv-smoke-gcp lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
var value = kubevirtcorev1.EvictionStrategyNone | ||
if *hc.Status.InfrastructureHighlyAvailable { | ||
value = kubevirtcorev1.EvictionStrategyLiveMigrate | ||
workerNodes := &corev1.NodeList{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a webhook depend on external resources, which is something we try to avoid as much as possible...
Also this will happen just once and ignore new nodes added later.
Not sure what this PR fixes TBH, and the linked issue doesn't really help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a webhook depend on external resources, which is something we try to avoid as much as possible...
You are correct. I don't like it either. Also, we do have a node controller to watch the nodes. But this controller is running in the operator, on different pod. We will fix this behavior when implementing the multi-arch cluster support. For now, I can't see better option.
Also this will happen just once and ignore new nodes added later.
This is code only sets the default value of the spec.evictionStrategy
field, so it must only run once, and that's ok, assuming that this is the case (setting the field on creation of the HyperConverged PR) for ARM clusters, as it wasn't supported until now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @dasionov!
We can simplify the unit tests; see the inline comments.
- Ensure that newly created HyperConverged resources default to EvictionStrategyNone when all worker nodes in the cluster are ARM64. This prevents unexpected live migrations on architectures where it may not be supported. The mutation only applies if the eviction strategy is unset, allowing users to override it later. - Update unit tests to verify evictionStrategy defaults based on node architecture. Test cases now cover all-ARM64 clusters (None), mixed or non-ARM64 clusters with high availability (LiveMigrate), and user overrides, ensuring accurate mutation behavior. Signed-off-by: Daniel Sionov <[email protected]>
5488e1c
to
2549a1a
Compare
|
hco-e2e-upgrade-operator-sdk-aws lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-upgrade-operator-sdk-azure, ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nunnatsa The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
hco-e2e-operator-sdk-sno-azure lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-aws In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@dasionov: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hco-e2e-kv-smoke-gcp lane succeeded. |
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Live-migration is not supported on ARM64 clusters because nodes might have some missing cpu types.
This PR Ensures that newly created
HyperConverged
resources default toEvictionStrategyNone
when all worker nodes in the cluster are ARM64.This prevents unexpected live migrations on architectures where it may not be supported.
The mutation only applies if the eviction strategy is unset, allowing users to override it later.
Reviewer Checklist
Jira Ticket:
Release note: