Skip to content

Commit 0444d84

Browse files
committed
Support filtering out objects with owner refs
There's generaly no great value in archiving objects having an owner reference (ie. pods generated from cronjobs/daemonsets/replicasets/...), when their owner is already archived (and able to re-generate them). This should reduce the commit churn on busy clusters, or with many cronjobs running at high frequency. While at it, group exclusion filters so we don't have to pass an increasing amount of functions arguments around. Closes #88
1 parent b5dc26a commit 0444d84

File tree

10 files changed

+169
-71
lines changed

10 files changed

+169
-71
lines changed

README.md

+16-14
Original file line numberDiff line numberDiff line change
@@ -25,20 +25,21 @@ To continuously push changes to a remote git repository:
2525
katafygio --git-url https://user:[email protected]/myorg/myrepos.git --local-dir /tmp/kfdump
2626
```
2727

28-
Filtering out irrelevant objects (esp. ReplicaSets and Pods) with `-x` or `-y`
29-
will help to keep resources usage low, and a concise git history. Eg.:
30-
28+
Filtering out irrelevant objects (esp. ReplicaSets and Pods) with `-w`, `-x`, `-y`
29+
and `-z` is useful to keep a concise git history.
3130

3231
```bash
33-
# Filtering out replicasets and pods since they are generated by Deployments
34-
# (already archived), endpoints (managed by Services), secrets (to keep them
35-
# confidential), events and node (irrelevant), and the leader-elector
36-
# configmap that has low value and changes a lot, causing commits churn.
37-
38-
katafygio -e /tmp/kfdump \
39-
-g https://user:[email protected]/myorg/myrepos.git \
40-
-x secret,pod,event,replicaset,node,endpoint \
41-
-y configmap:kube-system/leader-elector
32+
# Filtering out objects having an owner reference (eg. managed pods or replicasets,
33+
# from Deployments, Daemonsets etc that we already archive), secrets (confidential),
34+
# events and nodes (irrelevant), and a configmap named "leader-elector" that has
35+
# low value and is causing commits churn:
36+
37+
katafygio \
38+
--local-dir /tmp/kfdump \
39+
--git-url https://user:[email protected]/myorg/myrepos.git \
40+
--exclude-having-owner-ref \
41+
--exclude-kind secrets,events,nodes,endpoints \
42+
--exclude-object configmap:kube-system/leader-elector
4243
```
4344

4445
You can also use the [docker image](https://hub.docker.com/r/bpineau/katafygio/).
@@ -47,8 +48,8 @@ You can also use the [docker image](https://hub.docker.com/r/bpineau/katafygio/)
4748

4849
```
4950
Backup Kubernetes cluster as yaml files in a git repository.
50-
--exclude-kind (-x) and --exclude-object (-y) may be specified several times,
51-
or once with several comma separated values.
51+
--exclude-kind (-x), --exclude-object (-y) and --exclude-namespaces (-z)
52+
may be specified several times, or once with several comma separated values.
5253
5354
Usage:
5455
katafygio [flags]
@@ -64,6 +65,7 @@ Flags:
6465
-q, --context string Kubernetes configuration context
6566
-d, --dry-run Dry-run mode: don't store anything
6667
-m, --dump-only Dump mode: dump everything once and exit
68+
-w, --exclude-having-owner-ref Exclude all objects having an Owner Reference
6769
-x, --exclude-kind strings Ressource kind to exclude. Eg. 'deployment'
6870
-z, --exclude-namespaces strings Namespaces to exclude. Eg. 'temp.*' as regexes. This collects all namespaces and then filters them. Don't use it with the namespace flag.
6971
-y, --exclude-object strings Object to exclude. Eg. 'configmap:kube-system/kube-dns'

assets/helm-chart/katafygio/Chart.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ name: katafygio
55
home: https://github.com/bpineau/katafygio
66
sources:
77
- https://github.com/bpineau/katafygio
8-
version: 0.4.3
8+
version: 0.5.0
99
keywords:
1010
- backup
1111
- dump

assets/helm-chart/katafygio/README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This chart installs a [Katafygio](https://github.com/bpineau/katafygio) deployme
1818

1919
## Chart Details
2020

21-
If your backups are flooded by commits from uninteresting changes, you may filter out irrelevant objects using the `excludeKind` and `excludeObject` options.
21+
If your backups are flooded by commits from uninteresting changes, you may filter out irrelevant objects using the `excludeKind`, `excludeObject`, `excludeNamespaces`, and `excludeHavingOwnerRef` options.
2222

2323
By default, the chart will dump (and version) the clusters content in /tmp/kf-dump (configurable with `localDir`).
2424
This can be useful as is, to keep a local and ephemeral changes history. To benefit from long term, out of cluster, and centrally reachable persistence, you may provide the address of a remote git repository (with `gitUrl`), where all changes will be pushed.
@@ -59,6 +59,8 @@ The following table lists the configurable parameters of the Katafygio chart and
5959
| `healthcheckPort` | The port Katafygio will listen for health checks requests | `8080` |
6060
| `excludeKind` | Object kinds to ignore | `{"replicaset","endpoints","event"}` |
6161
| `excludeObject` | Specific objects to ignore (eg. "configmap:default/foo") | `nil` |
62+
| `excludeNamespaces` | List of regexps matching namespaces names to ignore | `nil` |
63+
| `excludeHavingOwnerRef` | Ignore all objects having an Owner Reference | `false` |
6264
| `rbac.create` | Enable or disable RBAC roles and bindings | `true` |
6365
| `rbac.apiVersion` | RBAC API version | `v1` |
6466
| `serviceAccount.create` | Whether a ServiceAccount should be created | `true` |

assets/helm-chart/katafygio/templates/deployment.yaml

+8
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,14 @@ spec:
6262
- --exclude-object={{ . }}
6363
{{- end }}
6464
{{- end }}
65+
{{- if .Values.excludeNamespaces }}
66+
{{- range .Values.excludeNamespaces }}
67+
- --exclude-namespaces={{ . }}
68+
{{- end }}
69+
{{- end }}
70+
{{- if .Values.excludeHavingOwnerRef }}
71+
- --exclude-having-owner-ref
72+
{{- end }}
6573
ports:
6674
- name: http
6775
containerPort: {{ .Values.healthcheckPort }}

assets/helm-chart/katafygio/values.yaml

+8-2
Original file line numberDiff line numberDiff line change
@@ -38,15 +38,21 @@ logOutput: stdout
3838

3939
# excludeKind is an array of excluded (not backuped) Kubernetes objects kinds.
4040
excludeKind:
41-
- replicaset
41+
- replicasets
4242
- endpoints
43-
- event
43+
- events
4444

4545
# excludeObject is an array of specific Kubernetes objects to exclude from dumps
4646
# (the format is: objectkind:namespace/objectname).
4747
# excludeObject:
4848
# - "configmap:kube-system/leader-elector"
4949

50+
# excludeNamespaces is an array of regexp matching excluded namespaces (v0.8.2+)
51+
#excludeNamespaces: []
52+
53+
# excludeHavingOwnerRef defines wether we should filter out objects having an owner reference (v0.8.2+).
54+
excludeHavingOwnerRef: false
55+
5056
# resyncInterval is the interval (in seconds) between full catch-up resyncs
5157
# (to catch possibly missed events). Set to 0 to disable resyncs.
5258
resyncInterval: 300

assets/katafygio.yaml

+15-6
Original file line numberDiff line numberDiff line change
@@ -33,22 +33,31 @@ resync-interval: 900
3333
# or daemonsets (which are already dumped), endpoints (managed by services,
3434
# already dumped), and noisy stuff (events, nodes...).
3535
#exclude-kind:
36-
# - secret
37-
# - pod
38-
# - replicaset
39-
# - node
40-
# - event
36+
# - secrets
37+
# - pods
38+
# - replicasets
39+
# - nodes
40+
# - events
4141
# - endpoints
4242

4343
# Example exclusion for specific objects:
4444
#exclude-object:
4545
# - configmap:kube-system/datadog-leader-elector
4646
# - deployment:default/testdeploy
4747

48+
# Exclude objects (like pods, replicasets) generated/managed by other
49+
# objects we already archive:
50+
#exclude-having-owner-ref: true
51+
52+
# Exclude namespaces matching some regular expressions:
53+
# exclude-namespaces:
54+
# - jenkins.*
55+
# - temp-.*
56+
4857
# Only dump objects belonging to a specific namespace
4958
#namespace:
5059

51-
# Set to true o dump once and exit (instead of continuously dumping new changes)
60+
# Set to true to dump once and exit (instead of continuously dumping new changes)
5261
dump-only: false
5362

5463
# Set to true to disable git versionning

cmd/execute.go

+15-3
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import (
55
"os"
66
"os/signal"
77
"path/filepath"
8+
"regexp"
89
"syscall"
910

1011
"github.com/spf13/afero"
@@ -31,8 +32,8 @@ var (
3132
Use: appName,
3233
Short: "Backup Kubernetes cluster as yaml files",
3334
Long: "Backup Kubernetes cluster as yaml files in a git repository.\n" +
34-
"--exclude-kind (-x) and --exclude-object (-y) may be specified several times,\n" +
35-
"or once with several comma separated values.",
35+
"--exclude-kind (-x), --exclude-object (-y) and --exclude-namespaces (-z)\n" +
36+
"may be specified several times, or once with several comma separated values.",
3637
SilenceUsage: true,
3738
SilenceErrors: true,
3839
PreRun: bindConf,
@@ -69,8 +70,19 @@ func runE(cmd *cobra.Command, args []string) (err error) {
6970
return fmt.Errorf("failed to start git repo handler: %v", err)
7071
}
7172

73+
exclnsre := make([]*regexp.Regexp, 0, len(exclnamespaces))
74+
for _, ns := range exclnamespaces {
75+
exclnsre = append(exclnsre, regexp.MustCompile(ns))
76+
}
77+
78+
exclusions := &controller.Exclusions{
79+
Names: exclobj,
80+
Namespaces: exclnsre,
81+
NoOwnerRef: noOwnerRef,
82+
}
83+
7284
evts := event.New()
73-
fact := controller.NewFactory(logger, filter, resyncInt, exclobj, exclnamespaces)
85+
fact := controller.NewFactory(logger, filter, resyncInt, exclusions)
7486
reco := recorder.New(logger, evts, localDir, resyncInt*2, dryRun).Start()
7587
obsv := observer.New(logger, restcfg, evts, fact, exclkind, namespace).Start()
7688

cmd/flags.go

+5
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ var (
2929
exclkind []string
3030
exclobj []string
3131
noGit bool
32+
noOwnerRef bool
3233
)
3334

3435
func bindPFlag(key string, cmd string) {
@@ -89,6 +90,9 @@ func init() {
8990
RootCmd.PersistentFlags().StringSliceVarP(&exclobj, "exclude-object", "y", nil, "Object to exclude. Eg. 'configmap:kube-system/kube-dns'")
9091
bindPFlag("exclude-object", "exclude-object")
9192

93+
RootCmd.PersistentFlags().BoolVarP(&noOwnerRef, "exclude-having-owner-ref", "w", false, "Exclude all objects having an Owner Reference")
94+
bindPFlag("exclude-having-owner-ref", "exclude-having-owner-ref")
95+
9296
RootCmd.PersistentFlags().StringVarP(&filter, "filter", "l", "", "Label filter. Select only objects matching the label")
9397
bindPFlag("filter", "filter")
9498

@@ -123,4 +127,5 @@ func bindConf(cmd *cobra.Command, args []string) {
123127
exclkind = viper.GetStringSlice("exclude-kind")
124128
exclobj = viper.GetStringSlice("exclude-object")
125129
noGit = viper.GetBool("no-git")
130+
noOwnerRef = viper.GetBool("exclude-having-owner-ref")
126131
}

pkg/controller/controller.go

+44-43
Original file line numberDiff line numberDiff line change
@@ -42,28 +42,33 @@ type logger interface {
4242
Errorf(format string, args ...interface{})
4343
}
4444

45+
// Exclusions groups filters used to ignore objects
46+
type Exclusions struct {
47+
Names []string
48+
Namespaces []*regexp.Regexp
49+
NoOwnerRef bool
50+
}
51+
4552
// Factory generate controllers
4653
type Factory struct {
47-
logger logger
48-
filter string
49-
resyncIntv time.Duration
50-
excludedobj []string
51-
excludedns []string
54+
logger logger
55+
filter string
56+
resyncIntv time.Duration
57+
exclusions *Exclusions
5258
}
5359

5460
// Controller is a generic kubernetes controller
5561
type Controller struct {
56-
name string
57-
stopCh chan struct{}
58-
doneCh chan struct{}
59-
syncCh chan struct{}
60-
notifier event.Notifier
61-
queue workqueue.RateLimitingInterface
62-
informer cache.SharedIndexInformer
63-
logger logger
64-
resyncIntv time.Duration
65-
excludedobj []string
66-
excludedns []*regexp.Regexp
62+
name string
63+
stopCh chan struct{}
64+
doneCh chan struct{}
65+
syncCh chan struct{}
66+
notifier event.Notifier
67+
queue workqueue.RateLimitingInterface
68+
informer cache.SharedIndexInformer
69+
logger logger
70+
resyncIntv time.Duration
71+
exclusions *Exclusions
6772
}
6873

6974
// New return a kubernetes controller using the provided client
@@ -73,8 +78,7 @@ func New(client cache.ListerWatcher,
7378
name string,
7479
filter string,
7580
resync time.Duration,
76-
excludedobj []string,
77-
excludednamespace []string,
81+
exclusions *Exclusions,
7882
) *Controller {
7983

8084
selector := metav1.ListOptions{LabelSelector: filter, ResourceVersion: "0", AllowWatchBookmarks: true}
@@ -117,23 +121,17 @@ func New(client cache.ListerWatcher,
117121
},
118122
})
119123

120-
exclnsre := make([]*regexp.Regexp, 0)
121-
for _, ns := range excludednamespace {
122-
exclnsre = append(exclnsre, regexp.MustCompile(ns))
123-
}
124-
125124
return &Controller{
126-
stopCh: make(chan struct{}),
127-
doneCh: make(chan struct{}),
128-
syncCh: make(chan struct{}, 1),
129-
notifier: notifier,
130-
name: name,
131-
queue: queue,
132-
informer: informer,
133-
logger: log,
134-
resyncIntv: resync,
135-
excludedobj: excludedobj,
136-
excludedns: exclnsre,
125+
stopCh: make(chan struct{}),
126+
doneCh: make(chan struct{}),
127+
syncCh: make(chan struct{}, 1),
128+
notifier: notifier,
129+
name: name,
130+
queue: queue,
131+
informer: informer,
132+
logger: log,
133+
resyncIntv: resync,
134+
exclusions: exclusions,
137135
}
138136
}
139137

@@ -208,7 +206,7 @@ func (c *Controller) processItem(key string) error {
208206
return fmt.Errorf("error fetching %s from store: %v", key, err)
209207
}
210208

211-
for _, obj := range c.excludedobj {
209+
for _, obj := range c.exclusions.Names {
212210
if strings.Compare(strings.ToLower(obj), strings.ToLower(c.name+":"+key)) == 0 {
213211
return nil
214212
}
@@ -231,7 +229,7 @@ func (c *Controller) processItem(key string) error {
231229
}
232230

233231
if namespace, ok := md["namespace"].(string); ok {
234-
for _, nsre := range c.excludedns {
232+
for _, nsre := range c.exclusions.Namespaces {
235233
if nsre.MatchString(namespace) {
236234
// Rely on the background sync to delete these excluded files if
237235
// we previously had acquired them
@@ -240,6 +238,10 @@ func (c *Controller) processItem(key string) error {
240238
}
241239
}
242240

241+
if _, ok := md["ownerReferences"]; ok && c.exclusions.NoOwnerRef {
242+
return nil
243+
}
244+
243245
yml, err := yaml.Marshal(obj)
244246
if err != nil {
245247
return fmt.Errorf("failed to marshal %s: %v", key, err)
@@ -254,17 +256,16 @@ func (c *Controller) enqueue(notif *event.Notification) {
254256
}
255257

256258
// NewFactory create a controller factory
257-
func NewFactory(logger logger, filter string, resync int, excludedobj []string, excludedns []string) *Factory {
259+
func NewFactory(logger logger, filter string, resync int, exclusions *Exclusions) *Factory {
258260
return &Factory{
259-
logger: logger,
260-
filter: filter,
261-
resyncIntv: time.Duration(resync) * time.Second,
262-
excludedobj: excludedobj,
263-
excludedns: excludedns,
261+
logger: logger,
262+
filter: filter,
263+
resyncIntv: time.Duration(resync) * time.Second,
264+
exclusions: exclusions,
264265
}
265266
}
266267

267268
// NewController create a controller.Controller
268269
func (f *Factory) NewController(client cache.ListerWatcher, notifier event.Notifier, name string) Interface {
269-
return New(client, notifier, f.logger, name, f.filter, f.resyncIntv, f.excludedobj, f.excludedns)
270+
return New(client, notifier, f.logger, name, f.filter, f.resyncIntv, f.exclusions)
270271
}

0 commit comments

Comments
 (0)