Skip to content

workloadRef with scaleDown:progressively continues to scale deployment after migration complete #4321

Open
@Suraiya-Hameed

Description

@Suraiya-Hameed

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

When using workloadRef with scaleDown: progressively, the Argo Rollouts controller continues to actively scale the referenced deployment even after the progressive migration is complete and the rollout is healthy. This causes deployment pods to be created and terminated when external scalers (like KEDA/HPA) adjust the rollout's replica count.

To Reproduce

To Reproduce

  1. Create a deployment with 50 replicas

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: myapp
    spec:
      replicas: 50
  2. Create a rollout with workloadRef pointing to the deployment

    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
    metadata:
      name: myapp
    spec:
      replicas: 50
      workloadRef:
        apiVersion: apps/v1
        kind: Deployment
        name: myapp
        scaleDown: progressively
  3. Wait for progressive migration to complete (deployment reaches 0 replicas, rollout becomes healthy)

  4. Configure KEDA/HPA to scale the rollout

    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: myapp
    spec:
      scaleTargetRef:
        apiVersion: argoproj.io/v1alpha1
        kind: Rollout
        name: myapp
      minReplicaCount: 30
      maxReplicaCount: 100
  5. Trigger scaling events (e.g., load changes that cause KEDA to scale the rollout up/down)

Expected behavior

Once the progressive migration is complete and the rollout is healthy:

  • The deployment should remain at 0 replicas
  • Scaling events (KEDA/HPA) should only affect the rollout
  • The deployment should not be scaled by the controller

Actual behavior

  • The deployment gets scaled up and down repeatedly (e.g., 0→26→0→10→0)
  • When KEDA scales the rollout, the controller recalculates and scales the deployment
  • Deployment pods are created and terminated even though it should stay at 0
  • This causes unnecessary resource consumption and pod churn

Screenshots

Version
1.8.2

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME
time="2025-06-12T23:31:04Z" level=info msg="Scaling deployment simpleapp-instance-7 to 26 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:25Z" level=info msg="Scaling deployment simpleapp-instance-7 to 25 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:25Z" level=info msg="Scaling deployment simpleapp-instance-7 to 24 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:26Z" level=info msg="Scaling deployment simpleapp-instance-7 to 21 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:45Z" level=info msg="Scaling deployment simpleapp-instance-7 to 20 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:45Z" level=info msg="Scaling deployment simpleapp-instance-7 to 19 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 6 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 7 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 8 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 9 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:59Z" level=info msg="Scaling deployment simpleapp-instance-7 to 10 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:31Z" level=info msg="Scaling deployment simpleapp-instance-7 to 9 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:31Z" level=info msg="Scaling deployment simpleapp-instance-7 to 8 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:32Z" level=info msg="Scaling deployment simpleapp-instance-7 to 7 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:32Z" level=info msg="Scaling deployment simpleapp-instance-7 to 6 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:33Z" level=info msg="Scaling deployment simpleapp-instance-7 to 5 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:33Z" level=info msg="Scaling deployment simpleapp-instance-7 to 4 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:35Z" level=info msg="Scaling deployment simpleapp-instance-7 to 3 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:35Z" level=info msg="Scaling deployment simpleapp-instance-7 to 2 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:36Z" level=info msg="Scaling deployment simpleapp-instance-7 to 1 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:40Z" level=info msg="Scaling deployment simpleapp-instance-7 to 0 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7

Additional Context

The root cause appears to be in reconcileNewReplicaSet in replicaset.go. The controller cannot distinguish between:

  1. Rollout scaling down due to failure/rollback (deployment should scale up)
  2. Rollout scaling down due to HPA/KEDA after migration is complete (deployment should stay at 0)

The workloadRef with scaleDown: progressively is intended for migration scenarios, but the controller continues to manage the deployment's replica count even after the migration is complete and the rollout is healthy.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions