kapp: Conflict on weird fields
Hi, i dont really understand some conflict errors, maybe someone can help
Heres an example; these fields appear in the diff :
kapp.k14s.io/nonce
: sounds legitimage
: legit as its a new versioninitialDelaySeconds
andcpu
: i guess its been “rewritten” by kube API
These changes looks legit but make kapp fails, any idea how to prevent this ?
Updating resource deployment/app-strapi (apps/v1) namespace: env-1000jours-sre-kube-workflow-4y3w36:
API server says:
Operation cannot be fulfilled on deployments.apps "app-strapi": the object has been modified; please apply your changes to the latest version and try again (reason: Conflict):
Recalculated diff:
11, 10 - kapp.k14s.io/nonce: "1660057353002414185"
12, 10 + kapp.k14s.io/nonce: "1660062422261409721"
223,222 - progressDeadlineSeconds: 600
225,223 - revisionHistoryLimit: 10
230,227 - strategy:
231,227 - rollingUpdate:
232,227 - maxSurge: 25%
233,227 - maxUnavailable: 25%
234,227 - type: RollingUpdate
237,229 - creationTimestamp: null
269,260 - image: something/strapi:sha-3977fb22378f2debdcacf4eeb6dd6f26dab24377
270,260 - imagePullPolicy: IfNotPresent
271,260 + image: something/strapi:sha-4ed2921f2fac053671f80fa02b72d124a23fa8c0
276,266 - scheme: HTTP
279,268 - successThreshold: 1
285,273 - protocol: TCP
291,278 - scheme: HTTP
292,278 + initialDelaySeconds: 0
297,284 - cpu: "1"
298,284 + cpu: 1
300,287 - cpu: 500m
301,287 + cpu: 0.5
307,294 - scheme: HTTP
309,295 - successThreshold: 1
310,295 - timeoutSeconds: 1
311,295 - terminationMessagePath: /dev/termination-log
312,295 - terminationMessagePolicy: File
316,298 - dnsPolicy: ClusterFirst
317,298 - restartPolicy: Always
318,298 - schedulerName: default-scheduler
319,298 - securityContext: {}
320,298 - terminationGracePeriodSeconds: 30
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 58 (28 by maintainers)
Hey !!
I finally resolved this issue that was occasioned by many factors (but finally only one was determining),
hypothesis # 1 The Bad
first, rancher was adding
metadata.annotations."field.cattle.io/publicEndpoints"
and the fix you gave us, using rebase rule is working for this issue, this is now patched in kube-workflow (legacy) and kontinuous @revolunet here are the fix (you could also put this content in the file created here https://github.com/SocialGouv/1000jours/commit/a81b816b71dc995690b64012d5bad9be02108983 the format I use is to be consumed by cli, the other is to be consumed by kapp kube controller which we don’t use):hypothesis # 2 The Ugly
#kapp+sealed-secret+reloader but the other thing that was breaking everything was, the combination of sealed-secret + reloader theses tools are compatibles but the behavior of the both combined with kapp is not, here is the process that break things:
hypothesis # 3 The Good one
Finally, a thing that I didn’t understood, it was the link between the command in the job and the deployment. When we had
pg_restore
in the job that was failing but when we replaced bysleep 240
(according to time that was taking to runpg_restore
) it was working. I was first thinking that was related ressources used, so I reserved large ressources for the job. But that was impacting even the rancher annotations (maybe the network usage had a side effect on operator, modifying the global behavior, very weird I was thinking). Finally, after disabled reloader, the deployment doesn’t seem to reboot, so I was thinking it was resolved, but few try later, the deployment started to reboot on kapp deploy before job endeed (the job is in change group that is required by change rule on deployment). Sorry for the unsustainable suspens (but it take me tens of hours)… It was the pod that was crashing. I totally didn’t knew how this service was supposed to work, but there was a poll every few seconds that was interracting with DB, and while the pg_restore was running, inconsistent data made it crash and restart. This restart, done by kube-controller-manager was making change on the manifests. I don’t know if this is an issue that can (and should) be treated at kapp level. But for now we can resolve this on our side.Sorry for big bazzard (and excuse me for my poor english). Thanks for your help and patience. And big up for developing this great tool that is kapp, we are using it every day !
Thank you! Trimming the extra stuff and keeping the necessary details. Just like the first comment, it’s the identity annotation that is causing the issue. As @100mik mentioned previously, we definitely need to find out and fix the issue with this annotation. I will bump the priority for this. Meanwhile I will also try to look for a short term solution for this.
Heyo! Sorry for the delay I was verifying a few options.
For the time being you could add the following kapp Config to you manifests:
This would exclude the problematic field while diffing all together.
If you already have a
kapp
Config you can just amend it with:Thank you so much for sharing. We will definitely take a look at it and let you know our next steps 😃
This is what I was about to suggest when you mentioned you are using reloader! This would ensure that every part of the update is handled by
kapp
. It might reduce some overhead as well!No worries! Happy to hack through this with you
Trying to process all the information, but two thoughts come to mind.
versioned resources
to update the deployment now?We are glad it helps!
@revolunet We will drop a ping on this issue when we have a release which resolves this.
Thanks for the prompt replies!
Gonna take a closer look at this, this is definitely not expected. However, I cannot reproduce the exact issue y’all have been running into 😦
The closest I could get was over here in the similar reproduction I posted, where
kapp
shows that theidentity
annotation is being removed when it is not.Marking this as a bug for now, since looks like the metadata on the deployment is as expected (assuming that
env-xxx-5dc5hx
is the ns you are working with)Ok here’s the top of the diff for that deployment :
note
1b7c24b0876fdb5c244aa3ada4d96329eb72e1a4
is the sha of the image currently running in the namespaceYeah, comparing the original diff with the recalculated diff would give us an idea of the fields that are getting updated in the background and we could then try to figure out a way to resolve it (maybe a rebase rule to not update those fields).