Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutator: default evictionStrategy to None on ARM64 clusters #3380

Conversation

dasionov
Copy link
Contributor

@dasionov dasionov commented Apr 3, 2025

What this PR does / why we need it:
Live-migration is not supported on ARM64 clusters because nodes might have some missing cpu types.

This PR Ensures that newly created HyperConverged resources default to EvictionStrategyNone when all worker nodes in the cluster are ARM64.
This prevents unexpected live migrations on architectures where it may not be supported.
The mutation only applies if the eviction strategy is unset, allowing users to override it later.

Reviewer Checklist

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

https://issues.redhat.com/browse/CNV-58613

Release note:

none

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Apr 3, 2025
if *hc.Status.InfrastructureHighlyAvailable {
value = kubevirtcorev1.EvictionStrategyLiveMigrate
workerNodes := &corev1.NodeList{}
err := cli.List(ctx, workerNodes, client.MatchingLabels{"node-role.kubernetes.io/worker": ""})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this label is enough. I case the workloads node-placement is set, I think we will need to use it instead.

@orenc1 - WDYT?

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some inline comments. you can ignore the nit one if you find it not heelping.

as a general comment, I would prefer to reuse the node controller, but this will probably require some API change in the status field, and we'll need to do that anyway to support heterogeneous cluster in the future, and this is not designed yet. so for now we'll have to do that this way.

@dasionov dasionov force-pushed the default_eviction_strategy_none_for_arm_clusters branch from ccf1cbf to 952006e Compare April 3, 2025 12:40
@dasionov
Copy link
Contributor Author

dasionov commented Apr 3, 2025

I also Adjusted the unit tests

@dasionov dasionov force-pushed the default_eviction_strategy_none_for_arm_clusters branch from 952006e to 5488e1c Compare April 3, 2025 12:45
@kubevirt-bot kubevirt-bot added size/L and removed size/M labels Apr 3, 2025
@hco-bot
Copy link
Collaborator

hco-bot commented Apr 3, 2025

hco-e2e-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-aws
hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-operator-sdk-gcp, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure, ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-sno-aws, ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

In response to this:

hco-e2e-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-aws
hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-operator-sdk-gcp, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 3, 2025

hco-e2e-upgrade-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-aws

In response to this:

hco-e2e-upgrade-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 3, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 3, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

var value = kubevirtcorev1.EvictionStrategyNone
if *hc.Status.InfrastructureHighlyAvailable {
value = kubevirtcorev1.EvictionStrategyLiveMigrate
workerNodes := &corev1.NodeList{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a webhook depend on external resources, which is something we try to avoid as much as possible...
Also this will happen just once and ignore new nodes added later.
Not sure what this PR fixes TBH, and the linked issue doesn't really help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a webhook depend on external resources, which is something we try to avoid as much as possible...

You are correct. I don't like it either. Also, we do have a node controller to watch the nodes. But this controller is running in the operator, on different pod. We will fix this behavior when implementing the multi-arch cluster support. For now, I can't see better option.

Also this will happen just once and ignore new nodes added later.

This is code only sets the default value of the spec.evictionStrategy field, so it must only run once, and that's ok, assuming that this is the case (setting the field on creation of the HyperConverged PR) for ARM clusters, as it wasn't supported until now.

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dasionov!

We can simplify the unit tests; see the inline comments.

- Ensure that newly created HyperConverged resources default to
EvictionStrategyNone when all worker nodes in the cluster are
ARM64. This prevents unexpected live migrations on architectures
where it may not be supported. The mutation only applies if the
eviction strategy is unset, allowing users to override it later.

- Update unit tests to verify evictionStrategy defaults based on
node architecture. Test cases now cover all-ARM64 clusters (None),
mixed or non-ARM64 clusters with high availability (LiveMigrate),
and user overrides, ensuring accurate mutation behavior.

Signed-off-by: Daniel Sionov <[email protected]>
@dasionov dasionov force-pushed the default_eviction_strategy_none_for_arm_clusters branch from 5488e1c to 2549a1a Compare April 7, 2025 07:29
Copy link

sonarqubecloud bot commented Apr 7, 2025

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 7, 2025

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-operator-sdk-gcp, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-upgrade-operator-sdk-azure, ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

In response to this:

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-operator-sdk-gcp, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 7, 2025

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nunnatsa
Copy link
Collaborator

nunnatsa commented Apr 7, 2025

/approve
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nunnatsa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2025
@hco-bot
Copy link
Collaborator

hco-bot commented Apr 7, 2025

hco-e2e-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-aws

In response to this:

hco-e2e-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

openshift-ci bot commented Apr 7, 2025

@dasionov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-kv-smoke-azure 5488e1c link true /test hco-e2e-kv-smoke-azure
ci/prow/hco-e2e-operator-sdk-sno-aws 2549a1a link false /test hco-e2e-operator-sdk-sno-aws
ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure 2549a1a link true /test hco-e2e-consecutive-operator-sdk-upgrades-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 7, 2025

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Apr 7, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot kubevirt-bot merged commit a878f63 into kubevirt:main Apr 7, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants