helm: UPGRADE FAILED: No resource with the name "" found
Repro
Create a simple Chart.yaml:
name: upgrade-repro
version: 0.1.0
With a single K8S resource in the templates/ dir:
kind: ConfigMap
apiVersion: v1
metadata:
name: cm1
data:
example.property.1: hello
Install the chart:
helm install .
exasperated-op
Last Deployed: Tue Sep 13 12:43:23 2016
Namespace: default
Status: DEPLOYED
Resources:
==> v1/ConfigMap
NAME DATA AGE
cm1 1 0s
Verify the release exists:
helm status exasperated-op
Last Deployed: Tue Sep 13 12:43:23 2016
Namespace: default
Status: DEPLOYED
Resources:
==> v1/ConfigMap
NAME DATA AGE
cm1 1 1m
Now add a 2nd K8S resource in templates/ dir:
kind: ConfigMap
apiVersion: v1
metadata:
name: cm2
data:
example.property.2: hello
Upgrade the chart:
helm upgrade exasperated-op .
Error: UPGRADE FAILED: Looks like there are no changes for cm1
That’s weird. Bump the version in Chart.yaml:
name: upgrade-repro
version: 0.2.0
Try upgrade again:
helm upgrade exasperated-op .
Error: UPGRADE FAILED: No resource with the name cm2 found.
Expected
helm upgrade should create the cm2 resource instead of erroring that it doesn’t exist.
Edit: to be clear: helm is creating the cm2 ConfigMap, but helm fails regardless.
Current state after performing steps
helm status exasperated-op
Last Deployed: Tue Sep 13 12:43:23 2016
Namespace: default
Status: DEPLOYED
Resources:
==> v1/ConfigMap
NAME DATA AGE
cm1 1 6m
kubectl get configmap --namespace default
NAME DATA AGE
cm1 1 6m
cm2 1 4m
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 124
- Comments: 110 (37 by maintainers)
Commits related to this issue
- feat: Add go-getter support to load base helmfiles (#1998) Resolves #1193 — committed to Nordix/helm by jonasrutishauser 3 years ago
This is a process I use to recover from this problem (so far it has worked every time without any incident… but be careful anyway):
Run
helm listand find out latest revision for affected chartGo from there and find latest revision with
DEPLOYEDstateOnce you find last DEPLOYED revision, change its state from
DEPLOYEDtoSUPERSEDEDand save the fileTry to do
helm upgradeagain, if it’s successful then you are done!If you encounter upgrade error like this:
then edit the status for very last revision from
FAILEDtoDEPLOYEDTry to do
helm upgradeagain, if it fails again just flip the table…This happens frequently with our usage of helm and requires a full
--purge. That is not a solution.Is there someone assigned to fix this? Is there a PR for this already? Can I help with anything?
I’ve been bitten by this issue more than once since it’s an easy situation to get yourself into but apparently there’s no easy way to get out of. I suppose the “good” part in my case is that resources are updated even with the error on the release (not sure if that makes me happy or worried)
I think helm should either forbid the user from getting into this wrong state or correctly handle it. Are there any real fixes to this outside of deleting everything (that is only viable for non-production uses)?
Atomic will not resolve the issue
Example chart: https://github.com/distorhead/ex-helm-upgrade-failure
Chart contains 2 deployments –
myserver1andmyserver2:myserver1from chart and modify deploymentmyserver2with user-error (delete image field for example):Say hello to our friend again:
@bacongobbler @thomastaylor312 @jkroepke
We’re also affected by the issue. We use the latest helm 2.10 with GKE 10.6. When can I expect that will be fixed? Do we have some reasonable workaround for the issue? Removing the whole deployment with
--purgeoption is so poor.I believe that with the
--cleanup-on-failflag enabled, this error case should go away. Closing as resolved via #4871 and #5143.I was able to workaround this with
helm rollbackand specifying the most recent revision (the one that failed)First, I think Helm should be OK with namespaces and other resources already existing if it’s trying to (re-)install them. Kubernetes is all about “make the configuration right, and let kube figure out how to make the world match the config.” Second I think Helm should be all-or-nothing. If a deploy fails, the cluster should be in the state it was before the deploy started. If there are two releases that both want to create namespace X, then there’s a reference counting problem. If there is a release that wants to create namespace X, but it already exists, then there’s a provenance problem. However, helm can record this using annotations on the objects, and do the right thing.
It is not applicable if you use CI/CD. What happens if a upgrade fails and you use rolling update strategy. Must I delete my still working release?
@bacongobbler I could be way off here. I am facing similar issues in upgrade. Judging by how difficult it is to solve this problem, I wonder if something more fundamental needs to be reconsidered. Part of the complexity appears to be due to the fact that Helm maintains its own version of the known configuration, separate from the actual source-of-truth which is kubernetes. Would the system be more reliable if Helm only kept a copy of previously deployed helm charts for the purposes of history and rollback, but didn’t use it at all during upgrade. Instead, Helm would get the truth from kubectl itself, and then always have a 2-way diff to perform?
If a helm chart says it should have resource X, and kubectl sees an existing resource X, then:
If the helm chart says it should have resource X and there isn’t one according to kubectl, then Helm creates it.
If kubectl reports that it has a resource Y tagged as being controlled by this helm chart, and there is no resource Y in this helm chart, then helm deletes the resource.
Any resources not tagged as being controlled by this helm chart are always ignored by helm when performing the upgrade, except in the case mentioned above where the helm chart says it needs resource X and X exists but isn’t tagged.
If for some reason the roll-out of a helm chart happens and fails, and only half the resources were rolled out, then during a rollback helm would use the stored config files from the previous successful deployment and run the exact same algorithm, or things could be left in a broken state relative to some helm command line flag. If the user attempts to upgrade again, since kubernetes is used as the source of truth and not the last-known successful deployment, it should still be a simple 2-way diff between the new helm chart and the existing state of the system.
@brendan-rius You’re welcome to contribute code to fix this issue, or think of ideas. See #3805 and #4146 for some pointers.
Getting similar error while upgrading:
Configmap is created
My configmap:
@bacongobbler Please, understand that this resource might be a production namespace’s Service or Deployment object, which might (and already did) heavily disrupt our service guarantees.
I found our issue was because of a failed deploy.
Helm doesn’t attempt to clean up after a failed deploy, which means things like the new ConfigMap I added above get created but without a reference in the ‘prior’ deploy. That means when the next deploy occurs, helm finds the resource in k8s and expects it to be referenced in the latest deployed revision (or something; I’m not sure what exact logic it uses to find the ‘prior’ release) to check what changes there are. It’s not in that release, so it cannot find the resource, and fails.
This is mainly an issue when developing a chart as a failed deploy puts k8s in a state helm does not properly track. When I figured out this is what was happening I knew I just needed to delete the ConfigMap from k8s and try the deploy again.
@bacongobbler @michelleN Is there anything what makes it hard to improve error message for this issue?
I believe error message should state that “there is a conflict because resource wasn’t created by helm and manual intervention is required” and not “not found”. Only this small change to the error will improve user experience by a good margin.
In our case we were adding a configmap to a chart and the chart fails to be upgraded with:
Note: We’re using 2.7.2; on later versions this message has changed to include the type of the resource that can’t be found.
I believe this happens because when helm is determining what has changed it looks for the new configmap resource in the old release, and fails to find it. See https://github.com/kubernetes/helm/blob/master/pkg/kube/client.go#L276-L280 for the code where this error comes from.
Tiller logs for the failing upgrade:
I think @distorhead might want to take a look at that one and see if it also resolves his concerns he raised in https://github.com/helm/helm/pull/4871. Other than that, it looks like
--atomicshould address the concern assuming you always use the--atomicflag.I don’t believe there’s been any proposed solutions to address the issue when you get into this particular state, but I could be wrong. If the mitigation strategy for this issue is
helm upgrade --atomicgoing forwardThen I think this is safe to close.
Reposting my comment here since it’s more related to this issue than 3-way merge strategies:
At this point, we’re stumped on how to proceed. We’ve discussed the bug for weeks and none of the proposed solutions will work in all cases, either by introducing new bugs or significantly changing tiller’s upgrade behaviour.
For example, @michelleN and I brainstormed earlier this week and thought of two possible solutions, neither of which are particularly fantastic:
This is very risky as the cluster may be in an unknown state after a failed upgrade, so Helm may be unable to proceed in a clean fashion, potentially causing application downtime.
This is extremely risky as Helm may delete objects that were installed via other packages or through
kubectl create, neither of which users may want.The safest option so far has been to ask users to manually intervene in the case of this conflict, which I’ll demonstrate below.
If anyone has suggestions/feedback/alternative proposals, we’d love to hear your thoughts.
To re-iterate the issue as well as the workaround:
When an upgrade that installs new resources fails, the release goes into a FAILED state and stops the upgrade process. The next time you call
helm upgrade, Helm does a diff against the last DEPLOYED release. In the last DEPLOYED release, this object did not exist, so it tries to create the new resource, but fails because it already exists. The error message is completely misleading as @arturictus points out.This can easily be worked around by manually intervening and deleting the resource that the error reports as “not found”. Following the example I demonstrated in https://github.com/helm/helm/pull/4223#issuecomment-397413568:
In other words, deleting resources created during the FAILED release works around the issue.
Thanks for putting this workaround together @bacongobbler - it’s essentially what we came to as a process as well. One painful issue here is during complex upgrades many new resources - at times a few dependencies levels deep - may find themselves in this state. I haven’t yet found a way to fully enumerate these states in an automatic way leading to situations where one needs to repeatedly fail an upgrade to “search” for all relevant resources. For example, recently a newly added dependency itself had a dependency on a postgresql chart. In order to resolve this issue it was necessary to delete a secret, configmap, service, deployment and pvc - each found the long way 'round.
We are seeing this problem, too. Our reproduce steps:
helm installa chart that successfully installs a deploymenthelm installnew chart. This will cause a rolling update of the deployment, which we’ve intentionally set up to fail.kubectl)helm install. We expect this to work, but it doesn’t. Reports the “No resource with name ___”. The name is that of the custom resource.kubectl. Nowhelm installwill work.Note that first attempt at
helm installwith a newly introduced custom resource in the chart must fail to get into this state.Hi,
we have the same issue as @dilumr described… with version 2.11.0:
Error: UPGRADE FAILED: no ConfigMap with the name "xxx" foundStill seeing this in v2.9.1 (currently released stable version)
I disagree that it’s “very dangerous” to back out of an upgrade. I think doing so is the correct solution.
Kubernetes is declarative. Snapshot what the cluster state was before attempting to upgrade. If there’s an error partway through, then roll back to the snapshot. If someone has script hooks that would leave the cluster in a bad state when doing this, then that’s their own fault. (Maybe that could be solved with rollback hooks, too)
Of course, it would be great if an upgrade was pre-flighted and didn’t file in the first place as much as possible. Errors in dependency charts generated by values or --set arguments should be possible to check before trying to change anything, for example. Things like forgetting to bump the version number could also be pre-flighted to avoid making changes when it won’t work.
this (ugly) fix works for me: 0. I’m getting error :
1. find last DEPLOYED revisions
In this output v17 is deployed
2. Delete all revisions including v17:
**3. Update release v16 to DEPLOYED **
** 4. (Important) Find all resources existing new resources were added since last deployed (v16) and delete them, for example
kubectl -nmonitor delete cm az-test-2-prom-prometheus-grafana-configkubectl -nmonitor delect svc ...Run
helm upgrade ...and see Happy HelmingYes I know. I’m just explaining and observing the bug’s behaviour so others know what is involved. 😃
My situation was that I had a new resource, and I deployed the new version of the helm chart with the new resource. That deployment failed b/c I fat fingered some yaml. Well, the new objects were created in kubernetes. I fixed the yaml, and ran the upgrade on my chart again, and voila, the error message that the resource is not found appears. I had to go into kubernetes and remove the new resources (in my case a role and rolebinding) that were created by the failed deployment. After that, the helm check to see if the current object exists fails (https://github.com/kubernetes/helm/blob/7432bdd716c4bc34ad95a85a761c7cee50a74ca3/pkg/kube/client.go#L257) will not succeed, and the resources are created again. Seems like a bug, where maybe new resources for a failed chart should be accounted for?
@michelleN thanks! Sorry I haven’t had time this week to attempt a repro on master. Looking forward to upgrading soon!
@michelleN I’ve prepared a PR to change the error text: #5460.
@jkroepke just pointed out to me that PR #5143 provides a good workaround for this. When the
--atomicflag is released in the next minor version, you should be able to use it to automatically purge or rollback when there is an error.@bacongobbler given you have been involved with most of the back and forth on this one, is there something else that can be done to fully fix this, or would the
--atomicflag be sufficient?we observe the same problem. It happens if you have a template, which is either:
{{if $condition -}}statement{{ range $index, $value := $array-}}I am hitting this issue as well with latest helm 2.12.0 and kubernetes 1.10.11, even rolling back to latest good release as @aguilarm suggested did not work, also deleting the resources that helm complains about does not help, and after the upgrade command fails it leaves those same resources as actually partially recreated. Very annoying for a prod env…
I have 2 clusters with very similar environment, main different between the 2 being the total number of nodes. In one case a
helm delete --purgefollowed by freshhelm installworked but in another it did not and I am yet to figure out a way to bring that to the latest template changes.Going back to @bacongobbler’s earlier comment:
I wonder if we can mitigate this risk by making the new behaviour opt-in? Within a given namespace I generally use helm exclusively, and I suspect this is the case for many. If I could give Helm install/upgrade a flag to tell it that anything in the given namespace that isn’t part of an existing release is fine to delete/overwrite, would that help?
Since you also said “via other packages”, I presume you don’t want Helm to have to examine other releases as part of performing a release, so my suggestion wouldn’t work except in the single-release-per-namespace model. To reply to that objection, I would say: if you want to manage multiple packages in a namespace and still get this behaviour, create an umbrella chart whose sole purpose is to specify the chart dependencies you want. Then use the new flag (“–exclusive”?) when deploying that umbrella chart.
Obviously this doesn’t solve the problem for all use cases, but perhaps it’s enough of a workaround.
I’m running into a similar issue where I have a chart with bundled dependencies. If I add a new dependency and run a
helm upgradethe result is the same as described. The resources are properly created however helm returns an error.So, if this is installed:
helm install -n my-releaseAnd then a new chart is added as a dependency:
When the release is upgraded with:
helm upgrade my-release my-thinghelm produces the following error:Error: UPGRADE FAILED: No resource with the name new-dependency found.