CA DRA: handle device taints and tolerations (KEP-5055) #7947
Labels
area/cluster-autoscaler
area/core-autoscaler
Denotes an issue that is related to the core autoscaler and is not specific to any provider.
kind/feature
Categorizes issue or PR as related to a new feature.
wg/device-management
Categorizes an issue or PR as relevant to WG Device Management.
Which component are you using?:
/area cluster-autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
KEP-5055 adds support for Device taints to DRA. This means that the individual Devices exposed in ResourceSlices can be tainted similarly to how Nodes can be tainted today (there are tolerations, taint-based eviction, etc.). The feature is behind a separate feature gate and went to alpha in 1.33.
As part of the KEP, admins can now create patch objects (
DeviceTaintRule
) that automatically add taints to all Devices matching certain conditions. Cluster Autoscaler needs to apply the patches to ResourceSlices before exposing them to the DRA scheduler plugin via scheduler framework.Describe the solution you'd like.:
We should update
autoscaler/cluster-autoscaler/simulator/dynamicresources/snapshot/snapshot_slice_lister.go
Line 25 in 9937f8f
k8s.io/dynamic-resource-allocation/resourceslice/tracker
if possible. We need to make sure that the patches are applied to the fake ResourceSlices created by CA as well, before they're used for anything else.Furthermore, we should extend the existing Node taint handling to Device taints:
DeviceTaintRules
shouldn't be filtered out from template NodeInfos if we can detect that they will apply to the new Node (e.g. if they apply to all Devices from a driver, regardless of the pool or other attributes).Additional context.:
The text was updated successfully, but these errors were encountered: