You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
assume nvdp cant start up, may be image not found or etc. then a gpu node come in nodegroup, p.nodeInfoCache will cache a node without nvidia.com/gpu; and this moment trigger scaledown to 0, this cache item still exist in cluster-autoscaler
when next scale-up triggered, even now nvdp is ok, due to this cache item, cant trigger scale-up, describe the pending pod will see:
The text was updated successfully, but these errors were encountered:
suqinglee
changed the title
gpu ndoegroup may cant trigger scale-up from 0
gpu nodegroup may cant trigger scale-up from 0
May 13, 2025
focus this code (cluster-autoscaler-1.26.6)
assume nvdp cant start up, may be image not found or etc. then a gpu node come in nodegroup,
p.nodeInfoCache
will cache a node without nvidia.com/gpu; and this moment trigger scaledown to 0, this cache item still exist in cluster-autoscalerwhen next scale-up triggered, even now nvdp is ok, due to this cache item, cant trigger scale-up, describe the pending pod will see:
The text was updated successfully, but these errors were encountered: