argo-cd: Failing to deploy services due to "helm dependency build" failure
Checklist:
- I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I’ve included steps to reproduce the bug.
- I’ve pasted the output of
argocd version
.
Describe the bug
Some Application
s based on helm
fail to deploy due to somekind of internal filesystem issue.
For example, one of the apps that are in Unknown
states
Name: redis-rule-hit-count-buffer-test-2
Project: default
Server: https://kubernetes.default.svc
Namespace: test-2
URL: https://35.232.222.64/applications/redis-rule-hit-count-buffer-test-2
Repo: git@github.com:gc-org/gc-saas-prod.git
Target: HEAD
Path: prod/common/infra/redis
SyncWindow: Sync Allowed
Sync Policy: Automated (Prune)
Sync Status: Unknown
Health Status: Healthy
CONDITION MESSAGE LAST TRANSITION
ComparisonError rpc error: code = Unknown desc = Manifest generation error (cached): `helm dependency build` failed exit status 1: Error: unable to move current charts to tmp dir: link error: cannot rename charts to tmpcharts: rename charts tmpcharts: file exists 2020-12-22 10:14:56 +0200 IST
This doesn’t eventually resolve itself, it’s stays this way…
To Reproduce
I’m not sure how to reproduce, this happens from time to time and causes complete deadlock
My Chart.yaml
name: redis
version: 0.1.0
apiVersion: v2
dependencies:
- name: redis
version: 11.2.2
repository: https://charts.bitnami.com/bitnami
My app-of-apps
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: redis-servers-test-2
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: git@github.com:gc-org/gc-saas-prod.git
targetRevision: HEAD
path: prod/cluster_1/customers/customer_2/redis
helm:
releaseName: redis-servers-test-2
destination:
server: https://kubernetes.default.svc
namespace: test-2
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: true
My template
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: redis-dashboard-top-widgets-cache-test-2
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
annotations:
argocd.argoproj.io/sync-wave: "3"
spec:
project: default
source:
repoURL: git@github.com:gc-org/gc-saas-prod.git
targetRevision: HEAD
path: prod/common/infra/redis
helm:
values: |
redis:
fullnameOverride: redis-dashboard-top-widgets-cache
redisPort: 6413
master:
nodeSelector:
cus: test-2
service:redis-rule-hit-count-buffer-master
port: 6413
image:
tag: 6.0.6
cluster:
enabled: false
existingSecret: "redis"
existingSecretPasswordKey: redis-password
releaseName: redis-dashboard-top-widgets-cache
destination:
server: https://kubernetes.default.svc
namespace: test-2
syncPolicy:
syncOptions:
- CreateNamespace=true
retry:
limit: -1 # unlimited
backoff:
duration: 5s
factor: 2
maxDuration: 5m
automated:
prune: true
selfHeal: true
allowEmpty: true
Expected behavior
The Application
should be deployed successfully
Screenshots
Version
argocd: v1.7.10+bcb05b0.dirty
BuildDate: 2020-11-21T00:34:29Z
GitCommit: bcb05b0c2e0f8006aa2d2abaf780e73c9e73c945
GitTreeState: dirty
GoVersion: go1.15.5
Compiler: gc
Platform: darwin/amd64
argocd-server: v1.8.1+c2547dc
BuildDate: 2020-12-10T02:59:21Z
GitCommit: c2547dca95437fdbb4d1e984b0592e6b9110d37f
GitTreeState: clean
GoVersion: go1.14.12
Compiler: gc
Platform: linux/amd64
Ksonnet Version: v0.13.1
Kustomize Version: v3.8.1 2020-07-16T00:58:46Z
Helm Version: v3.4.1+gc4e7485
Kubectl Version: v1.17.8
Logs
Logs from the argocd-application-controller
time="2020-12-22T08:14:58Z" level=info msg="Normalized app spec: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2020-12-22T08:14:56Z\",\"message\":\"rpc error: code = Unknown desc = Manifest generation error (cached): `helm dependency build` failed exit status 1: Error: unable to move current charts to tmp dir: link error: cannot rename charts to tmpcharts: rename charts tmpcharts: file exists\",\"type\":\"ComparisonError\"}]}}" application=redis-rule-hit-count-buffer-test-2
Logs from argocd-repo-server
time="2020-12-22T08:40:44Z" level=error msg="finished unary call with code Unknown" error="Manifest generation error (cached): `helm dependency build` failed exit status 1: Error: unable to move current charts to tmp dir: link error: cannot rename charts to tmpcharts: rename charts tmpcharts: file exists" grpc.code=Unknown grpc.method=GenerateManifest grpc.request.deadline="2020-12-22T08:41:43Z" grpc.service=repository.RepoServerService grpc.start_time="2020-12-22T08:40:44Z" grpc.time_ms=474.433 span.kind=server system=grpc
time="2020-12-22T08:40:44Z" level=info msg="manifest cache hit: &ApplicationSource{RepoURL:git@github.com:gc-org/gc-saas-prod.git,Path:prod/common/infra/redis,TargetRevision:HEAD,Helm:&ApplicationSourceHelm{ValueFiles:[],Parameters:[]HelmParameter{},ReleaseName:,Values:redis:\n redisPort: 6380\n master:\n service:\n port: 6380\n image:\n tag: 6.0.6\n nodeSelector:\n cus: test-1\n cluster:\n enabled: false\n existingSecret: \"redis\"\n existingSecretPasswordKey: redis-password\n,FileParameters:[]HelmFileParameter{},Version:,},Kustomize:nil,Ksonnet:nil,Directory:nil,Plugin:nil,Chart:,}/cead2aa7818699b6c3ef04fc7e35390ee0fcbee0"
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 26
- Comments: 73 (5 by maintainers)
I believe another fix for this may be to update your stable repo in the argocd-cm ConfigMap as follows:
After this, refresh your Argo applications.
We too are experiencing this quite often. Our workaround is to just run
redis-cli flushall
in the Redis pod.We finally resolved this by increasing
--repo-server-timeout-seconds
on the application-controllerARGOCD_EXEC_TIMEOUT
on the repo-serverDoesn’t make me happy, but until
helm dep update
is thread-safe(https://github.com/helm/helm/pull/8846#issuecomment-768479847), it’ll have to do.Quick note on this one:
That’s a generic error message from the golang
context
package, and it just means “a timeout happened somewhere.”We’ve made efforts recently to always wrap all error messages to provide more context. Hopefully in future versions the reason for the timeout will be much more clear.
these are two terrible problems after working with argocd two years from version 1.7.x ~ 2.3.x, especially in an urgent deploy case.
helm dependency build
failed timeout after 1m30smost time, I know it is not a performance issue after I followed everything from the official doc[high availability], it is just a cache problem. during these 2 years, there is still only a workaround to fix it:recreate argocd-repo-server and flush all data in redis cluster. thanks God!
repo-server and
redis
pods …After this week security upgrade ArgoCD to v2.2.5 (from v2.2.2 in my case) repoServer started to give this “helm dependency build” failure. Even v2.2.2 already includes helm 3.7 I’ve increased the timeouts and added a 2nd, 3rd and 4th replica but the repoServer starts to eat all cpu. When rolling back to v2.2.2 this issue disappears. It works for a while, even forcing manually sync 220 apps, but after some hours start filing with the “help dependency build” failure.
I was able to upgrade to 1.7.11 - but it had no effect.
I ended up applying the following configuration through a custom
values.yaml
file:After this, I had to
helm template [...] | kubectl apply [...]
for it to kick in, since ArgoCD is non-functional.Just try out the 3.29.5 and specify image tag 2.2.5, it will work fine. We were trying to identify the problem with chart causing that and we have some clues, but no confirmation so far
The below example, AFAIK, is
helm
v3 correct ?apiVersion: v2
is forhelm
3 whileapiVersion: v1
is forhelm
2 - correct ?Helm 3.7.1 is merged in argocd master and should be part of next release (2.1.7) https://github.com/argoproj/argo-cd/commit/2770c690a5597fcbab344cd2ad494c918472bdd1
Seeing as this issue is “resolved” temporarily - by killing repo-server pods, so they get recreated - its clearly a caching problem. My guess is that reposerver runs Helm in parallel on SAME pod - which Helm does not actually support - and hence once you hit 2 simultaneous Helm runs on same pod - Helm breaks and never recovers from that (due to caching). A simple “lock” around Helm - ensuring it is not run simultaenously, should confirm this as the issue (and resolve it 😃
Just hit this in v2.0.1 also… when doing recovery test… so syncing many (10+) applications (all helm applications) reproduces it.
I’m running v1.7.6 as is - I’m going to give updating to 1.7.11 the old college try and report back.
Thanks @JasP19 I will give it a shot as I can