Description
Checklist:
- I've included steps to reproduce the bug.
- I've included the version of argo rollouts.
Describe the bug
When using workloadRef with scaleDown: progressively, the Argo Rollouts controller continues to actively scale the referenced deployment even after the progressive migration is complete and the rollout is healthy. This causes deployment pods to be created and terminated when external scalers (like KEDA/HPA) adjust the rollout's replica count.
To Reproduce
To Reproduce
-
Create a deployment with 50 replicas
apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 50
-
Create a rollout with
workloadRef
pointing to the deploymentapiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: myapp spec: replicas: 50 workloadRef: apiVersion: apps/v1 kind: Deployment name: myapp scaleDown: progressively
-
Wait for progressive migration to complete (deployment reaches 0 replicas, rollout becomes healthy)
-
Configure KEDA/HPA to scale the rollout
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: myapp spec: scaleTargetRef: apiVersion: argoproj.io/v1alpha1 kind: Rollout name: myapp minReplicaCount: 30 maxReplicaCount: 100
-
Trigger scaling events (e.g., load changes that cause KEDA to scale the rollout up/down)
Expected behavior
Once the progressive migration is complete and the rollout is healthy:
- The deployment should remain at 0 replicas
- Scaling events (KEDA/HPA) should only affect the rollout
- The deployment should not be scaled by the controller
Actual behavior
- The deployment gets scaled up and down repeatedly (e.g., 0→26→0→10→0)
- When KEDA scales the rollout, the controller recalculates and scales the deployment
- Deployment pods are created and terminated even though it should stay at 0
- This causes unnecessary resource consumption and pod churn
Screenshots
Version
1.8.2
Logs
# Paste the logs from the rollout controller
# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts
# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME
time="2025-06-12T23:31:04Z" level=info msg="Scaling deployment simpleapp-instance-7 to 26 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:25Z" level=info msg="Scaling deployment simpleapp-instance-7 to 25 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:25Z" level=info msg="Scaling deployment simpleapp-instance-7 to 24 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:26Z" level=info msg="Scaling deployment simpleapp-instance-7 to 21 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:45Z" level=info msg="Scaling deployment simpleapp-instance-7 to 20 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:45Z" level=info msg="Scaling deployment simpleapp-instance-7 to 19 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 6 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 7 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 8 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:55Z" level=info msg="Scaling deployment simpleapp-instance-7 to 9 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:31:59Z" level=info msg="Scaling deployment simpleapp-instance-7 to 10 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:31Z" level=info msg="Scaling deployment simpleapp-instance-7 to 9 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:31Z" level=info msg="Scaling deployment simpleapp-instance-7 to 8 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:32Z" level=info msg="Scaling deployment simpleapp-instance-7 to 7 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:32Z" level=info msg="Scaling deployment simpleapp-instance-7 to 6 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:33Z" level=info msg="Scaling deployment simpleapp-instance-7 to 5 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:33Z" level=info msg="Scaling deployment simpleapp-instance-7 to 4 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:35Z" level=info msg="Scaling deployment simpleapp-instance-7 to 3 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:35Z" level=info msg="Scaling deployment simpleapp-instance-7 to 2 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:36Z" level=info msg="Scaling deployment simpleapp-instance-7 to 1 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
time="2025-06-12T23:32:40Z" level=info msg="Scaling deployment simpleapp-instance-7 to 0 replicas" namespace=simpleapp-ns-7 rollout=simpleapp-instance-7
Additional Context
The root cause appears to be in reconcileNewReplicaSet
in replicaset.go
. The controller cannot distinguish between:
- Rollout scaling down due to failure/rollback (deployment should scale up)
- Rollout scaling down due to HPA/KEDA after migration is complete (deployment should stay at 0)
The workloadRef
with scaleDown: progressively
is intended for migration scenarios, but the controller continues to manage the deployment's replica count even after the migration is complete and the rollout is healthy.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.