argo-rollouts: Rollout failing with msg "the object has been modified; please apply your changes to the latest version"
Checklist:
- [ x] I’ve included steps to reproduce the bug.
- [ x] I’ve included the version of argo rollouts.
Describe the bug updates to services in argo rollouts are failing suddenly with this msg for no reason. the only change we made was change the image tag of the Rollout
To Reproduce
it fails and gets in this state when mutliple rollout image tags are updated at once. if we then do a rollout retry
one service at a time, each service succeeds.
Expected behavior Rollout should succeed. has no reason to fail since the only thing chaged is updated image tag
Screenshots
Version 1.6.0
Logs
roCtx.reconcile err Operation cannot be fulfilled on replicasets.apps "pg-query-65bc4849f5": the object has been modified; please apply your changes to the latest version and try again
# Paste the logs from the rollout controller
# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts
# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Reactions: 12
- Comments: 33 (8 by maintainers)
I want to just comment I think we are also seeing some issues as well with one of our clusters in regards to this so spending some time looking into it.
I have faced the similar problems in our cluster.
1. Rollout is stuck while canary update
When a Rollout is updated, both old and new ReplicaSet are running and then Rollout is stuck. I could see the following message in the status of Rollout.
Here is a snippet of kubectl get replicaset. Hash
676f9f555d
are new, and7b7cdd9847
is old.I deleted the old ReplicaSet and then Rollout status became Healthy.
2. Rollout status becomes Degraded even if pods are running
When a Rollout is updated, it becomes the degraded status even if all new pods are running. I could see the following message in the status of Rollout:
I could refresh the status of Rollout by restating the argo-rollouts-controller.
Hi. We are still seeing this in 1.6.4. In this case, a new rollout was triggered, and show up in the UI, but did not start rolling out. I clicked the
promote
button and then it went ahead.Do any of you use notifications within your rollouts specs? Trying to see if there is a correlation between notifications updating the replicate spec.
We are also seeing this happen a lot more. Yesterday HPA increased the number of replicas, but the Rollout did not bring up more pods. The Rollout object itself had the correct number set, it’s just the new pods weren’t coming up. Killing Argo Rollouts controller always fixes these stuck cases.
It’s definitely happening a lot more with the 1.6 version than before.