Skip to content

Allow disabling a Kustomization's finalizer #662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
julienduchesne opened this issue May 25, 2022 · 15 comments · Fixed by #1314
Closed

Allow disabling a Kustomization's finalizer #662

julienduchesne opened this issue May 25, 2022 · 15 comments · Fixed by #1314

Comments

@julienduchesne
Copy link

julienduchesne commented May 25, 2022

Discussed here: https://cloud-native.slack.com/archives/CLAJ40HV3/p1653481382666929 but opening this for further discussion

The prune setting of Kustomizations has a two-fold behavior. It will delete resources which are removed from source, as is expected from a GitOps tool and it also deletes all resources managed by the Kustomization through its finalizer, when it is deleted.

We'd like to be able to keep pruning but disable the finalizer for the following reasons:

  1. Prevent a cascading deletion in case someone deletes the Kustomizationresource from the source (git). Flux manages itself on our clusters and it can be unexpected for users that deleting or modifying a single resource will cascade to others
  2. Allow easier management of Kustomizations. They currently need to be suspended to be interacted with. Our usecase: we have a "global" Kustomization that manages other Kustomizations, which are located in each namespace. The global Kustomization is set to NOT prune to avoid a cascade deletion of the whole cluster on its deletion. If we got rid of the finalizer, we could manage everything with prune. This would allow us to do operations like renaming all Kustomizations in the cluster without any manual intervention
  3. Safeguard against accidental deletion. I understand that production clusters should have access restrictions in place. However, in dev for example, a user may wish to disable Flux and delete the Kustomization, not knowing that this will delete all of the namespace's resources

I'd propose a new Kustomization setting: finalizerMode with three modes: none (disabled), delabel (remove flux labels on managed resources, prune (current mode)

@makkes
Copy link
Member

makkes commented May 25, 2022

@julienduchesne
Copy link
Author

Please read https://fluxcd.io/docs/components/kustomize/kustomization/#garbage-collection.

Thanks, this seems to match my understanding of how GC works. Did I get anything wrong in my issue description?

@stefanprodan
Copy link
Member

Having annotations as a bandaid for not setting proper RBAC is not something I would consider. If users don't understand how Flux works, a better way to protect the cluster from destructive actions, would be to remove write & delete actions in RBAC, so expect for cluster admins, no other user could edit or delete Flux objects.

@julienduchesne
Copy link
Author

All of our manifests are generated from a monorepo. We have CODEOWNERs set up for the paths that generate Kustomizations, but even a code owner can make a mistake. We'd like to have Flux be a little less destructive, and we have no use for the finalizer

Also, point 2, this would allow us to do more with GitOps, less manual actions.

And I didn't think things through, I meant a Kustomization setting, not an annotation. Updated the description.

Sorry, I don't mean to be pushy. I just want to be sure I am fully understood before dropping this, if you still don't agree 🙂

@zxbyoyoyo
Copy link

I also encountered the same problem, when setting prune=true there are two behaviors,

  1. deleting a single resource in Git will delete it from K8s;
  2. when the Kustomize object is deleted, all managed resources will be deleted from K8s.

If I want to use the first behavior and don't need the second behavior, I need to add annotation kustomize.toolkit.fluxcd.io/prune: disabled to each resource to avoid not being recycled, which is very unfriendly , I also recommend providing a new field for the second behavior to support global control at the fluxcd level

@erikgb
Copy link
Contributor

erikgb commented Nov 23, 2024

I have a different use case, but I am also hit by the twofold behavior of prune described in this issue. In my case, I am struggling with consistently tearing down dynamic review environments in multi-tenant clusters. I posed a question about this on Slack, but just got non-helpful responses. 😉

In a multi-tenant (shared) cluster, end-users cannot have RBAC with full permissions to namespaces, as that would break the multi-tenant setup. To work around this, multiple custom solutions exist. OpenShift and Rancher name resources controlling namespaces projects. In the wider open source community, there is (at least) HNC, Accurate and Capsule. We run on OpenShift, but their Project API is really horrible, so we have been looking for alternatives. After trying all of the tools listed above, we landed on Accurate, and we are quite happy with it.

Let me first explain my setup. To make it simple/clear, I will omit some details, but feel free to ask for more details! For a dynamic review environment we create the following resources:

  • SubNamespace
  • GitRepository
  • Kustomization

The source resource is not so interesting in this context, so I will focus on other two (omitting details not relevant):

apiVersion: accurate.cybozu.com/v2
kind: SubNamespace
metadata:
  name: review-namespace-for-branch
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: review-namespace-for-branch
spec:
  prune: true # or false
  serviceAccountName: review-service-account
  targetNamespace: review-namespace-for-branch

The SubNamespace will eventually create the review-namespace-for-branch namespace. Accurate is configured to propagate the required review-service-account RBAC to the sub-namespace, so this works well until the review environment is about to be deleted (by deleting the resources). Since the SubNamespace and Kustomization is deleted "at the same time", this initiates a race between Accurate (and Kubernetes garbage collection) and Flux:

  • If Flux wins, this work well - since the impersonated SA still has access to delete resources inside the review namespace.
  • But if Flux loses the race, the Kustomization is doomed in a non-finalized state, requiring intervention from a cluster-admin.

If I set prune to false, there is no race, but that makes the review environment almost useless for the end user - since changes on their branch is not fully reflected in the environment.

What I am proposing to fix this, is something similar to what has already been suggested. Since we want this to be a non-breaking change, I would like to introduce a new optional field in Kustomization: deletionPolicy. This is inspired by how Crossplane (and others model this). The default value will be the current behavior of Flux pruning: Delete. But to solve this issue, the alternative value would be Orphan. I think this could be a clean solution not creating confusion for users, but I would be interested in your opinions! 🙏

I would be happy to contribute this feature if we can agree on an acceptable API enhancement.

@stefanprodan
Copy link
Member

@erikgb we can't deprecate nor drop .spec.prune, so I guess .spec.deletionPolicy will work only when pruning is enable?

@stefanprodan
Copy link
Member

Maybe a better naming would be finalizePolicy: prune|keep?

@erikgb
Copy link
Contributor

erikgb commented Nov 23, 2024

@erikgb we can't deprecate nor drop .spec.prune, so I guess .spec.deletionPolicy will work only when pruning is enable?

No, both could be useful! Setting prune: false and deletionPolicy: Delete sounds like a strange thing to do, but the behavior should be clear: Flux should finalize/delete the resources it thinks are controlled by the Kustomization when doing garbage collection (when removing the finalizer).

@erikgb
Copy link
Contributor

erikgb commented Nov 23, 2024

Maybe a better naming would be finalizePolicy: prune|keep?

Works for me! Are you on-board with the use cases here? 🤔

@stefanprodan
Copy link
Member

No, both could be useful! Setting prune: false and deletionPolicy: Delete sounds like a strange thing to do, but the behavior should be clear: Flux should finalize/delete the resources it thinks are controlled by the Kustomization when doing garbage collection (removing the finalizer).

That's a major breaking change, people set prune: false and after we add the new field, things will get wiped on finalization, which I find unacceptable.

@erikgb
Copy link
Contributor

erikgb commented Nov 23, 2024

That's a major breaking change, people set prune: false and after we add the new field, things will get wiped on finalization, which I find unacceptable.

No doubt! I agree totally, of course. Hoping we can find something that makes sense and works. But the new field actually has 3 different values: unset, Delete, Orphan. The current behavior should be kept if field unset. Does that work for you?

@stefanprodan
Copy link
Member

We need to come up with 3 polices and not just have unset as a silent things: the default one that would follow prune and keep the behaviour unchanged, one that forces GC even if prune is false, and one that skips the GC even if prune is true.

@stefanprodan
Copy link
Member

Maybe something like finalizePolicy: InheritPrune, ForcePrune, SkipPrune.

@erikgb
Copy link
Contributor

erikgb commented Nov 23, 2024

Maybe something like finalizePolicy: InheritPrune, ForcePrune, SkipPrune.

I have already invited my team to a "Flux naming-party" on Monday, but your suggestions seems good to me. If we ever are going to have Kustomization v2, this could be further improved by merging prune and this new field. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants