argo-cd: App synch fails with ComparisonError rpc error: code = DeadlineExceeded desc = context deadline exceeded
Checklist:
- I’ve searched in the docs and FAQ for my answer: http://bit.ly/argocd-faq.
- I’ve included steps to reproduce the bug.
- I’ve pasted the output of
argocd version
.
Describe the bug
I’ve added different apps to my argo-cd but all of the fail to synch with the same error
ComparisonError rpc error: code = DeadlineExceeded desc = context deadline exceeded
To Reproduce
Here is the manifest from one of the demo projects which fails
project: default
source:
repoURL: 'https://github.com/argoproj/argocd-example-apps.git'
path: kustomize-guestbook
targetRevision: HEAD
destination:
server: 'https://kubernetes.default.svc'
namespace: guestbook
Remarks
Not sure if this matters however…
- Cluster is setup with
rke
and contains 3 nodes. - Ingress is an argo-tunnel
- Access to cluster seems ok
argocd cluster list SERVER NAME VERSION STATUS MESSAGE https://kubernetes.default.svc 1.18 Successful
Expected behavior
Synchronization works fine and without errors
Version
argocd: v1.6.1+159674e
BuildDate: 2020-06-19T00:39:46Z
GitCommit: 159674ee844a378fb98fe297006bf7b83a6e32d2
GitTreeState: clean
GoVersion: go1.14.1
Compiler: gc
Platform: linux/amd64
argocd-server: v1.6.1+159674e
BuildDate: 2020-06-19T00:41:05Z
GitCommit: 159674ee844a378fb98fe297006bf7b83a6e32d2
GitTreeState: clean
GoVersion: go1.14.1
Compiler: gc
Platform: linux/amd64
Ksonnet Version: v0.13.1
Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
Kubectl Version: v1.14.0
Logs
argocd app get demo
Name: demo
Project: default
Server: https://kubernetes.default.svc
Namespace: guestbook
URL: https://argocd.mycompany.com/applications/demo
Repo: https://github.com/argoproj/argocd-example-apps.git
Target: HEAD
Path: kustomize-guestbook
SyncWindow: Sync Allowed
Sync Policy: <none>
Sync Status: Unknown
Health Status: Healthy
CONDITION MESSAGE LAST TRANSITION
ComparisonError rpc error: code = DeadlineExceeded desc = context deadline exceeded 2020-06-27 19:06:51 +0200 CEST
argocd-server
time="2020-07-01T09:18:45Z" level=info msg="Requested app 'demo' refresh"
time="2020-07-01T09:18:45Z" level=warning msg="finished unary call with code DeadlineExceeded" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" grpc.code=DeadlineExceeded grpc.method=RevisionMetadata grpc.service=application.ApplicationService grpc.start_time="2020-07-01T09:15:41Z" grpc.time_ms=183821.61 span.kind=server system=grpc
argocd-application-controller*
I suspect the issue is cause if the i/o timeout
2020/07/01 09:18:05 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56566->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:05Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56566->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=1 git_ms=460 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2020-07-01T09:18:05Z" level=info msg="Update successful" application=demo
time="2020-07-01T09:18:05Z" level=info msg="Reconciliation completed" application=demo dedup_ms=0 dest-namespace=guestbook dest-server="https://kubernetes.default.svc" diff_ms=1 fields.level=2 git_ms=460 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0 time_ms=12570
time="2020-07-01T09:18:07Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=demo
time="2020-07-01T09:18:07Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: guestbook)" application=demo
time="2020-07-01T09:18:08Z" level=info msg="getRepoObjs stats" application=demo build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=542 unmarshal_ms=542 version_ms=0
2020/07/01 09:18:20 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56652->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:20Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56652->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=2 git_ms=543 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2020-07-01T09:18:20Z" level=info msg="Update successful" application=demo
time="2020-07-01T09:18:20Z" level=info msg="Reconciliation completed" application=demo dedup_ms=0 dest-namespace=guestbook dest-server="https://kubernetes.default.svc" diff_ms=2 fields.level=2 git_ms=543 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0 time_ms=12636
time="2020-07-01T09:18:22Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=demo
time="2020-07-01T09:18:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: guestbook)" application=demo
time="2020-07-01T09:18:22Z" level=info msg="getRepoObjs stats" application=demo build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=633 unmarshal_ms=633 version_ms=0
2020/07/01 09:18:34 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56756->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:34Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56756->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=2 git_ms=633 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2020-07-01T09:18:35Z" level=info msg="Update successful" application=demo
time="2020-07-01T09:18:35Z" level=info msg="Reconciliation completed" application=demo dedup_ms=0 dest-namespace=guestbook dest-server="https://kubernetes.default.svc" diff_ms=2 fields.level=2 git_ms=633 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0 time_ms=12756
time="2020-07-01T09:18:38Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=demo
time="2020-07-01T09:18:38Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: guestbook)" application=demo
time="2020-07-01T09:18:38Z" level=info msg="getRepoObjs stats" application=demo build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=384 unmarshal_ms=384 version_ms=0
2020/07/01 09:18:50 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56842->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:50Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56842->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=2 git_ms=385 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
For convenience, I’ve attached the logs from argocd-server
and argocd-application-controller
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 22
- Comments: 36 (3 by maintainers)
Commits related to this issue
- fix: bump up values again https://github.com/argoproj/argo-cd/issues/3864 — committed to holtje/hive-config by docwhat 3 years ago
I have the same issue while adding a new https repository from bitbucket and gitlab:
Having following error message:
Unable to connect HTTPS repository: Get “https://bitbucket.org/xxx/yyy/info/refs?service=git-upload-pack”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I enabled --insecure flag for argocd-server
These are the settings applied to the argocd-application-controller:
I also tried to scale up argocd-repo-server (5 replicas) and it did not work.
git clone inside argocd-repo-server is working fine.
Installed following ArgoCD versions and had the same issue for all of them:
My cluster has 3 nodes and 1 master.
This is happening for manual and auto sync policies while creating a new app.
What if the git timeout is definitely caused by slowness not (directly) related to ArgoCD ?
To be more precise, what if the slowness is caused by rate limiting? Then argocd will start retrying all the calls that are rate-limited for more than 15 seconds, thus hammering the git server with more and more requests, thereby indirectly causing the rate limiting to increase.
It would certainly be very useful if it were possible to change that timeout setting, in that situation.
(PS: Sorry if this comes across emotionally, I just had a bad day trying to figure out why our production applications suddenly weren’t updating any more, and why azuregit was suddenly refusing to talk to argocd.)
Maybe it could help to somebody as well. We have ArgoCD on AKS cluster and we faced similar issue. According https://argo-cd.readthedocs.io/en/release-1.8/operator-manual/high_availability/ we have tried to set workaround:
--repo-server-timeout-seconds
for argocd-application-controller and issue disapiered for now.In my case I found that all commands were stopped at 15s exactly, browsing the code I found this part here that would explain we stop synchronization after 15 seconds :
Git client seems to not handle timeout specified in
--repo-server-timeout-seconds
so, even if the GRPC request is extended the git fetch is not. Should this option also be applied to git client ?Regards,
References : https://github.com/argoproj/argo-cd/blob/e92e0fa4090a0c324b7a7dbf3d7a2e16ab281d85/util/git/client.go#L112
We’re experiencing the same timeout issues after deploying an ApplicationSet which creates 9 applications.
ArgoCD version deployed: v2.2.2+03b17e0 ArgoCD ApplicationSet version: v0.3.0
Controller settings:
GitHub API rate limits are 5000 API calls per hour, or ~83 calls per minute. I’m not knowledgeable enough about ArgoCD engine to say that we’re hitting such but limits but I think it could be very plausible that such limits are being reached as I’m able to create the Applications one by one!
UPDATE: I’ve also added
appResyncPeriod
together with the above configs and re-deployed. Being patient was key now! 🔑 After about 10minutes, none of my 9 applications where in anUnkown
state anymoreUPDATE 2: At the seconds refresh, the
repo-server
started throwing timeouts again!