Open
Description
What would you like to be added:
I think the default/out-of-box behavior when NFD is installed should be that the NodeFeature
CRs should have their owner reference set to v1.Node
object.
Why is this needed:
The rationale is basically summarized at https://ahmet.im/blog/nfd-incident/. Basically, any other alternative is worse:
- Owner is DaemonSet Pod (current default): Means your node labels are gonna get cleared during a rolling update of daemonset. No-go for a lot of installations that want to guarantee labels will always be there.
- No owner (configured via a CLI flag): Means you'll leak NodeFeature CRs (though the controller can totally clean these up during periodic resyncs if it has that logic in nfd-gc).
Parenting to v1.Node has the following advantages:
- NodeFeature resource doesn't get randomly deleted by the controller (and cause incidents like the one linked above) and its lifespan is now tied to Node itself.
- Eliminates the need for nfd-gc as Kubernetes garbage collector in kube-controller-manager would now handle the removal.
I can't think of any downsides to having a single ownerReference set to the Node object.