argo-rollouts: Rollout failing with msg "the object has been modified; please apply your changes to the latest version"

Checklist:

  • [ x] I’ve included steps to reproduce the bug.
  • [ x] I’ve included the version of argo rollouts.

Describe the bug updates to services in argo rollouts are failing suddenly with this msg for no reason. the only change we made was change the image tag of the Rollout

To Reproduce it fails and gets in this state when mutliple rollout image tags are updated at once. if we then do a rollout retry one service at a time, each service succeeds.

Expected behavior Rollout should succeed. has no reason to fail since the only thing chaged is updated image tag

Screenshots Screenshot 2023-10-05 at 8 01 19 PM

Screenshot 2023-10-05 at 8 05 02 PM

Version 1.6.0

Logs

roCtx.reconcile err Operation cannot be fulfilled on replicasets.apps "pg-query-65bc4849f5": the object has been modified; please apply your changes to the latest version and try again
# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 12
  • Comments: 33 (8 by maintainers)

Most upvoted comments

I want to just comment I think we are also seeing some issues as well with one of our clusters in regards to this so spending some time looking into it.

I have faced the similar problems in our cluster.

1. Rollout is stuck while canary update

When a Rollout is updated, both old and new ReplicaSet are running and then Rollout is stuck. I could see the following message in the status of Rollout.

old replicas are pending termination

Here is a snippet of kubectl get replicaset. Hash 676f9f555d are new, and 7b7cdd9847 is old.

worker-676f9f555d-25xd9                      1/1     Running     0          3h22m
worker-676f9f555d-dkmlt                      1/1     Running     0          3h22m
worker-7b7cdd9847-d4xl6                      1/1     Running     0          5h8m

I deleted the old ReplicaSet and then Rollout status became Healthy.

2. Rollout status becomes Degraded even if pods are running

When a Rollout is updated, it becomes the degraded status even if all new pods are running. I could see the following message in the status of Rollout:

ProgressDeadlineExceeded: ReplicaSet "poller-64d95bc44b" has timed out progressing.

I could refresh the status of Rollout by restating the argo-rollouts-controller.

Hi. We are still seeing this in 1.6.4. In this case, a new rollout was triggered, and show up in the UI, but did not start rolling out. I clicked the promote button and then it went ahead.

image

image

Do any of you use notifications within your rollouts specs? Trying to see if there is a correlation between notifications updating the replicate spec.

We are also seeing this happen a lot more. Yesterday HPA increased the number of replicas, but the Rollout did not bring up more pods. The Rollout object itself had the correct number set, it’s just the new pods weren’t coming up. Killing Argo Rollouts controller always fixes these stuck cases.

It’s definitely happening a lot more with the 1.6 version than before.