Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller: recovered from panic when workloadRef does not exist #4193

Open
2 tasks done
onematchfox opened this issue Mar 17, 2025 · 1 comment · May be fixed by #4208
Open
2 tasks done

controller: recovered from panic when workloadRef does not exist #4193

onematchfox opened this issue Mar 17, 2025 · 1 comment · May be fixed by #4208
Labels
bug Something isn't working bug-reproduced

Comments

@onematchfox
Copy link

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

When using workloadRef, controller will end up recovering from panic if referenced workload does not exist.

Recovered from panic: runtime error: invalid memory address or nil pointer dereference
goroutine 436 [running]:
    runtime/debug.Stack()
    /usr/local/go/src/runtime/debug/stack.go:26 +0x5e
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1()
    /go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:151 +0x49
panic({0x2c7d280?, 0x51d0410?})
    /usr/local/go/src/runtime/panic.go:785 +0x132
k8s.io/kubernetes/pkg/util/labels.CloneSelectorAndAddLabel(0x0, {0x31c7be3, 0x1a}, {0xc0011090b0, 0xa})
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/labels/labels.go:81 +0xa3
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).createDesiredReplicaSet(0xc0013bb808)
    /go/src/github.com/argoproj/argo-rollouts/rollout/sync.go:148 +0x318
github.com/argoproj/argo-rollouts/rollout.(*Controller).newRolloutContext(0xc0006fb340, 0xc0009f3808)
    /go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:561 +0xa4c
github.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc0006fb340, {0x37d5a30, 0xc00069a820}, {0xc002950a22, 0x7})
    /go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:410 +0x3b8
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1()
    /go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:155 +0x6f
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1({0x37e9560, 0xc000ba0840}, {0x3194bd8, 0x7}, 0xc002b5feb0, {0x37d5a30, 0xc00069a820}, 0xc000b8b9e0, {0x2abd860, 0xc001484500})
    /go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:159 +0x38a
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem({0x37d5a30, 0xc00069a820}, {0x37e9560, 0xc000ba0840}, {0x3194bd8, 0x7}, 0xc002b5feb0, 0xc000b8b9e0)
    /go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:178 +0xb6
github.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)
    /go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:106
github.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()
    /go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:352 +0xa6
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00144e930, {0x37a39a0, 0xc0015312f0}, 0x1, 0xc0006fecb0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00144e930, 0x3b9aca00, 0x0, 0x1, 0xc0006fecb0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
created by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run in goroutine 362
    /go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:351 +0x98

Due to the panic the status on the Rollout is also never updated.

To Reproduce

Apply following YAML to cluster of your choice.

apiVersion: v1
kind: Namespace
metadata:
  name: foo
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: foo
  namespace: foo
spec:
  strategy:
    canary:
      steps:
        - setWeight: 50
        - pause:
            duration: "1m"
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: foo

Monitor controller logs to observe "recover from panic" (see logs below).

Describe rollout

$ kubectl describe rollout foo   
Name:         foo
Namespace:    foo
Labels:       <none>
Annotations:  <none>
API Version:  argoproj.io/v1alpha1
Kind:         Rollout
Metadata:
  Creation Timestamp:  2025-03-17T13:43:39Z
  Generation:          1
  Resource Version:    273422247
  UID:                 0689cfe6-b2ea-4ab6-a99a-5121919868be
Spec:
  Strategy:
    Canary:
      Steps:
        Set Weight:  50
        Pause:
          Duration:  1m
  Workload Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         foo
Events:
  Type    Reason                  Age    From                 Message
  ----    ------                  ----   ----                 -------
  Normal  RolloutAddedToInformer  6m11s  rollouts-controller  Rollout resource added to informer: foo/foo

Expected behavior

Controller should gracefully handle this situation and set appropriate status (could reuse InvalidSpec) and message detailing the problem.

Screenshots

Version

v1.8.0

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
time="2025-03-17T13:50:45Z" level=info msg="Started syncing rollout" generation=1 namespace=foo resourceVersion=273422247 rollout=foo
time="2025-03-17T13:50:45Z" level=info msg="Reconciliation completed" generation=1 namespace=foo resourceVersion=273422247 rollout=foo time_ms=0.964304
time="2025-03-17T13:50:45Z" level=error msg="Recovered from panic: runtime error: invalid memory address or nil pointer dereference\ngoroutine 437 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1()\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:151 +0x49\npanic({0x2c7d280?, 0x51d0410?})\n\t/usr/local/go/src/runtime/panic.go:785 +0x132\nk8s.io/kubernetes/pkg/util/labels.CloneSelectorAndAddLabel(0x0, {0x31c7be3, 0x1a}, {0xc0013c9360, 0xa})\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/labels/labels.go:81 +0xa3\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).createDesiredReplicaSet(0xc00083cc08)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/sync.go:148 +0x318\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).newRolloutContext(0xc0006fb340, 0xc0013c2308)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:561 +0xa4c\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc0006fb340, {0x37d5a30, 0xc00069a820}, {0xc002950a22, 0x7})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:410 +0x3b8\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1()\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:155 +0x6f\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1({0x37e9560, 0xc000ba0840}, {0x3194bd8, 0x7}, 0xc002d43eb0, {0x37d5a30, 0xc00069a820}, 0xc000b8b9e0, {0x2abd860, 0xc002045ec0})\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:159 +0x38a\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem({0x37d5a30, 0xc00069a820}, {0x37e9560, 0xc000ba0840}, {0x3194bd8, 0x7}, 0xc002d43eb0, 0xc000b8b9e0)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:178 +0xb6\ngithub.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:106\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:352 +0xa6\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00144e960, {0x37a39a0, 0xc001531320}, 0x1, 0xc0006fecb0)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00144e960, 0x3b9aca00, 0x0, 0x1, 0xc0006fecb0)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f\nk8s.io/apimachinery/pkg/util/wait.Until(...)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161\ncreated by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run in goroutine 362\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:351 +0x98\n" namespace=foo rollout=foo
time="2025-03-17T13:50:45Z" level=error msg="rollout syncHandler error: Recovered from Panic" namespace=foo rollout=foo
time="2025-03-17T13:50:45Z" level=info msg="rollout syncHandler queue retries: 58 : key \"foo/foo\"" namespace=foo rollout=foo
time="2025-03-17T13:50:45Z" level=error msg="Recovered from Panic" error="<nil>"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@onematchfox onematchfox added the bug Something isn't working label Mar 17, 2025
@onematchfox
Copy link
Author

This behaviour is ultimately the result of rollout.Spec.Selector being nil when calling labelsutil.CloneSelectorAndAddLabel and seems to be a result of the logic within controller.go where any error returned by c.refResolver.Resolve(r) is initially ignored (only handled later here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working bug-reproduced
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants