kubectl: kubectl rollout undo should warn about undefined behaviour with kubectl apply
What would you like to be added:
Make kubectl rollout undo
and kubectl apply
work together. kubectl rollout undo
doesn’t revert the kubectl.kubernetes.io/last-applied-configuration
annotation, which leads to problems if named objects in arrays are intended to be removed.
Why is this needed: I ran into the issue that I had an old container still being present after applying a rename of a sidecar. Symptoms look like these issues: https://github.com/kubernetes/kubernetes/issues/63982 and https://github.com/kubernetes/kubernetes/issues/25390.
What happened to me, is as follows:
- deployment has containers
a
+b
- I wanted to rename
b
toc
and changec
’s implementation - I ran
kubectl apply
with the changed yaml with a version ofc
which contains an error - rollout status doesn’t finish, because the pod goes into a CrashLoopBackOff
- I see the failure occur and decide to revert to the last version:
kubectl rollout undo
- the deployment is reverted and the new replica set is scaled down
- however the
kubectl.kubernetes.io/last-applied-configuration
annotation on the deployment’s metadata is not rolled back - the next time I deploy the
a
+c
yaml for the deployment,b
is not in the last applied configuration so kubernetes config merge algorithm doesn’t removeb
- the pods are running with containers
a
+b
+c
The rollout undo
command doesn’t rollback kubectl.kubernetes.io/last-applied-configuration
which doesn’t make sense imho.
I reported this to Google Support, with a reproduction, their answer was this:
We do not typically recommend using “rollout” to be honest, in general we would recommend re-applying the old config.
There are a lot of cases where the old config no long is available to the user, especially since rollout undo
is only done when there is something going wrong. At these times you want kubectl
to be your friend who has your back…
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 42 (31 by maintainers)
kubectl rollout undo
is an imperative command, similar tokubectl edit
, and not intended to be used withkubectl apply
in its original design. Forkubectl apply
users, modifying configs (e.g. revert changes) and then apply changes throughkubectl apply
is the declarative way to do a rollback./retitle kubectl rollout undo should warn about undefined behaviour with kubectl apply
(I think it is mislead to accept this as previously titled; alternatively we can revert the title and the triage and create a separate issue for the warning)
We discussed this at last week’s SIG meeting and will follow up again next week after taking some time to reflect. You can listen to the conversation starting at https://youtu.be/Wh6eU1AfHBY?t=294. Notes follow.
This is happening because the implementation of
kubectl rollout undo
essentially copies the spec of the targeted ReplicaSet into the Deployment, without touching the last-applied annotation. That means last-applied continues to reflect the spec from the apply operation that created the newer ReplicaSet we moved off of, not the older one whose spec is now back in the Deployment. Because of this, the strategic merge calculation performed by the nextkubectl apply
will be making an illogical and potentially invalid comparison. In the case where the set of fields under management differ among the three points in time in question,apply
may draw conclusions about field ownership that are perfectly correct given the inputs, but incorrect given the real history / actual user intent.In @johngmyers example, when the rollback happens, the Deployment ends up in a state where the spec has container “foo”, and the last-applied annotation has container “bar”. When
apply
subsequently runs with container “bar” added again, it sees only “bar” in the annotation. It therefore assumes that a different actor added “foo” and it should be left alone.This is impossible to fix with code alone: we’d need to start keeping more historical data than we do today. Specifically, we’d need to preserve the last-applied information by copying it to ReplicaSet. It is deliberately not being copied over today, because doing so would cause
apply
to think it manages the ReplicaSet directly (it doesn’t). However, we could copy the data over into a different annotation, and then perform the reverse translation when copying back during a rollback. Something similar would need to be done for the newermetadata.managedFields
used by server-side apply as well. Overall, it would be a significant increase in the size of all ReplicaSet objects created from Deployments managed byapply
, for all users. We need to carefully consider whether this bug fix is worth that non-trivial cost.SIG-CLI often faces challenges related to the interaction between the imperative and declarative commands kubectl provides. For example, making changes with
kubectl edit
betweenapply
operations is capable of causing a very similar problem. My personal opinion is thatrollout undo
falls in a grey area between the two types of command. The user’s intent in this case isn’t to make a manual change; it’s to roll back the entire Deployment object to its previous state. Where theapply
data is stored is an implementation detail they may have no awareness of, and there’s nothing that signals to them that the rollback will be incomplete in an important way. Also unlike the crud imperative commands,rollout
does not strike me as something we can call intended for experimental rather than production use. On the contrary, it sounds like something you should be able to reach for in an emergency.Points to reflect on and come back to at next week’s meeting:
cc @seans3 @eddiezane @ant31
Reproduction procedure:
Expected result: Deployment containing one container named
bar
. Actual result: Deployment containing two containers:foo
andbar
.