Consider implementing pod (anti)affinity checks in NodeFit #1279

logyball · 2023-10-30T14:10:52Z

Is your feature request related to a problem? Please describe.
Hello, we have been running into a situation which was somewhat surprising until I dug into the code and learned that NodeFit does not take into account all scheduler predicates, notably pod (anti)Affinity. At our organization, we've had multiple occasions where using descheduler HighNodeUtilization profile + GKE's optimize-utilization autoscaling profile has bit us because we assumed that NodeFit and the kube-scheduler would agree.

The basic situation is:

Node A contains Pod X
Node B contains Pod Y
Pod X and Pod Y have a pod anti-affinity rule that prevents them from being scheduled on the same node
Node A is considered "underutilized" by the descheduler
Node B is not considered "underutilized"
NodeFit algorithm is run and determines that Pod X can schedule on Node B
descheduler evicts Pod X
kube-scheduler determines Pod X cannot schedule on Node B because of Pod X/Y's anti-affinity
kube-scheduler schedules Pod X on Node A
The cycle continues with each loop of the descheduler

Describe the solution you'd like
I suggest that we look into implementing Pod (anti)affinity checks into the NodeFit calculation, and would be happy to look into contributing it. It probably adds a decent amount of complexity, but would likely make the descheduler's decisions more closely resemble that of the kube-scheduler, what do you think?

Describe alternatives you've considered

Do nothing: perhaps call out in the documentation that the NodeFit arg does not take into account pod (anti)affinity

What version of descheduler are you using?

descheduler version: v0.27.x

The text was updated successfully, but these errors were encountered:

logyball · 2023-10-30T15:42:23Z

Also, in a bit of follow up, after digging into a bit of how the scheduler implements these checks, it's not easy to parse for me. I'd love some guidance here if this is an avenue we'd like to pursue. It looks like in the past (last seen in v1.9.11 of k8s), there was a Predicate checker that we could probably have used to good effect. However, now the ecosystem looks a bit more complex.

gyuvaraj10 · 2023-11-13T16:39:41Z

We have also run this issue with the PodAntiAffinity in our case is a preferred choice. We have noticed that after the pod eviction the node still remained and the pod is scheduled onto the same node as the cluster autoscaler did not scale down the node immediately.

k8s-triage-robot · 2024-02-11T16:55:41Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

logyball · 2024-02-11T16:57:30Z

/remove-lifecycle stale

nikimanoledaki · 2024-02-12T12:17:18Z

Hi! 👋 I am interested in picking up this issue and contributing a PR to add this feature.

In the scenario outlined above by @logyball, NodeFit() runs and determines Pod X can be scheduled on Node B when it shouldn't. The happy path would be that NodeFit() should run and determine that Pod X cannot be scheduled on Node B.

@a7i mentioned here that this could be implemented as a patch.

The solution would be to add a predicate to NodeFit() so that it considers pod anti-affinity. Something like this:

func NodeFit(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, node *v1.Node) []error {

...
	// Check if pod matches pod anti-affinity rule of other pod on node
	if ok, err := utils.PodMatchesAntiAffinityRule(pod, node); err != nil { // add to pkg/utils/predicates.go
		errors = append(errors, err)
	} else if !ok {
		errors = append(errors, fmt.Errorf("pod violates pod anti-affinity rule of other pod on the node"))
	}

	return errors
}

Implementation detail: Refactor the plugin RemovePodsViolatingInterPodAntiAffinity which already contains logic for fetching and comparing the pod anti-affinity rule of pods in checkPodsWithAntiAffinityExist. This logic could be moved to pkg/utils/predicates.go and used for NodeFit() as well.

Happy to assign myself to this issue and get started when I get the green light from SIG Scheduling/descheduler maintainers :)

logyball · 2024-02-12T14:39:05Z

/kind feature

nikimanoledaki · 2024-02-12T16:26:20Z

/assign

logyball mentioned this issue Oct 30, 2023

NodeFit Specification #1149

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2024

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 12, 2024

k8s-ci-robot assigned nikimanoledaki Feb 12, 2024

nikimanoledaki mentioned this issue Feb 27, 2024

Check whether pod matches the inter-pod anti-affinity of another Pod in a given Node in NodeFit() #1356

Merged

k8s-ci-robot closed this as completed in #1356 Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider implementing pod (anti)affinity checks in NodeFit #1279

Consider implementing pod (anti)affinity checks in NodeFit #1279

logyball commented Oct 30, 2023 •

edited

Loading

logyball commented Oct 30, 2023 •

edited

Loading

gyuvaraj10 commented Nov 13, 2023

k8s-triage-robot commented Feb 11, 2024

logyball commented Feb 11, 2024

nikimanoledaki commented Feb 12, 2024 •

edited

Loading

logyball commented Feb 12, 2024

nikimanoledaki commented Feb 12, 2024

Consider implementing pod (anti)affinity checks in NodeFit #1279

Consider implementing pod (anti)affinity checks in NodeFit #1279

Comments

logyball commented Oct 30, 2023 • edited Loading

logyball commented Oct 30, 2023 • edited Loading

gyuvaraj10 commented Nov 13, 2023

k8s-triage-robot commented Feb 11, 2024

logyball commented Feb 11, 2024

nikimanoledaki commented Feb 12, 2024 • edited Loading

logyball commented Feb 12, 2024

nikimanoledaki commented Feb 12, 2024

logyball commented Oct 30, 2023 •

edited

Loading

logyball commented Oct 30, 2023 •

edited

Loading

nikimanoledaki commented Feb 12, 2024 •

edited

Loading