argo-cd: App synch fails with ComparisonError rpc error: code = DeadlineExceeded desc = context deadline exceeded

Checklist:

  • I’ve searched in the docs and FAQ for my answer: http://bit.ly/argocd-faq.
  • I’ve included steps to reproduce the bug.
  • I’ve pasted the output of argocd version.

Describe the bug

I’ve added different apps to my argo-cd but all of the fail to synch with the same error

ComparisonError  rpc error: code = DeadlineExceeded desc = context deadline exceeded

To Reproduce

Here is the manifest from one of the demo projects which fails

project: default
source:
  repoURL: 'https://github.com/argoproj/argocd-example-apps.git'
  path: kustomize-guestbook
  targetRevision: HEAD
destination:
  server: 'https://kubernetes.default.svc'
  namespace: guestbook

Remarks

Not sure if this matters however…

  • Cluster is setup with rke and contains 3 nodes.
  • Ingress is an argo-tunnel
  • Access to cluster seems ok
     argocd cluster list
     SERVER                          NAME  VERSION  STATUS      MESSAGE
     https://kubernetes.default.svc        1.18     Successful 
    

Expected behavior

Synchronization works fine and without errors

Version

argocd: v1.6.1+159674e
  BuildDate: 2020-06-19T00:39:46Z
  GitCommit: 159674ee844a378fb98fe297006bf7b83a6e32d2
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.6.1+159674e
  BuildDate: 2020-06-19T00:41:05Z
  GitCommit: 159674ee844a378fb98fe297006bf7b83a6e32d2
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.14.0

Logs

argocd app get demo
Name:               demo
Project:            default
Server:             https://kubernetes.default.svc
Namespace:          guestbook
URL:                https://argocd.mycompany.com/applications/demo
Repo:               https://github.com/argoproj/argocd-example-apps.git
Target:             HEAD
Path:               kustomize-guestbook
SyncWindow:         Sync Allowed
Sync Policy:        <none>
Sync Status:        Unknown
Health Status:      Healthy

CONDITION        MESSAGE                                                              LAST TRANSITION
ComparisonError  rpc error: code = DeadlineExceeded desc = context deadline exceeded  2020-06-27 19:06:51 +0200 CEST

argocd-server

time="2020-07-01T09:18:45Z" level=info msg="Requested app 'demo' refresh"
time="2020-07-01T09:18:45Z" level=warning msg="finished unary call with code DeadlineExceeded" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" grpc.code=DeadlineExceeded grpc.method=RevisionMetadata grpc.service=application.ApplicationService grpc.start_time="2020-07-01T09:15:41Z" grpc.time_ms=183821.61 span.kind=server system=grpc

argocd-application-controller*

I suspect the issue is cause if the i/o timeout

2020/07/01 09:18:05 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56566->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:05Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56566->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=1 git_ms=460 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2020-07-01T09:18:05Z" level=info msg="Update successful" application=demo
time="2020-07-01T09:18:05Z" level=info msg="Reconciliation completed" application=demo dedup_ms=0 dest-namespace=guestbook dest-server="https://kubernetes.default.svc" diff_ms=1 fields.level=2 git_ms=460 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0 time_ms=12570
time="2020-07-01T09:18:07Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=demo
time="2020-07-01T09:18:07Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: guestbook)" application=demo
time="2020-07-01T09:18:08Z" level=info msg="getRepoObjs stats" application=demo build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=542 unmarshal_ms=542 version_ms=0
2020/07/01 09:18:20 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56652->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:20Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56652->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=2 git_ms=543 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2020-07-01T09:18:20Z" level=info msg="Update successful" application=demo
time="2020-07-01T09:18:20Z" level=info msg="Reconciliation completed" application=demo dedup_ms=0 dest-namespace=guestbook dest-server="https://kubernetes.default.svc" diff_ms=2 fields.level=2 git_ms=543 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0 time_ms=12636
time="2020-07-01T09:18:22Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=demo
time="2020-07-01T09:18:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: guestbook)" application=demo
time="2020-07-01T09:18:22Z" level=info msg="getRepoObjs stats" application=demo build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=633 unmarshal_ms=633 version_ms=0
2020/07/01 09:18:34 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56756->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:34Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56756->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=2 git_ms=633 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2020-07-01T09:18:35Z" level=info msg="Update successful" application=demo
time="2020-07-01T09:18:35Z" level=info msg="Reconciliation completed" application=demo dedup_ms=0 dest-namespace=guestbook dest-server="https://kubernetes.default.svc" diff_ms=2 fields.level=2 git_ms=633 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0 time_ms=12756
time="2020-07-01T09:18:38Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=demo
time="2020-07-01T09:18:38Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: guestbook)" application=demo
time="2020-07-01T09:18:38Z" level=info msg="getRepoObjs stats" application=demo build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=384 unmarshal_ms=384 version_ms=0
2020/07/01 09:18:50 cache: Set key="app|resources-tree|demo|1.0.0" failed: read tcp 10.42.67.206:56842->10.43.73.252:6379: i/o timeout
time="2020-07-01T09:18:50Z" level=error msg="Failed to cache app resources: read tcp 10.42.67.206:56842->10.43.73.252:6379: i/o timeout" application=demo dedup_ms=0 diff_ms=2 git_ms=385 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0

For convenience, I’ve attached the logs from argocd-server and argocd-application-controller

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 22
  • Comments: 36 (3 by maintainers)

Commits related to this issue

Most upvoted comments

I have the same issue while adding a new https repository from bitbucket and gitlab:

Having following error message:

Unable to connect HTTPS repository: Get “https://bitbucket.org/xxx/yyy/info/refs?service=git-upload-pack”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I enabled --insecure flag for argocd-server

These are the settings applied to the argocd-application-controller:

- command:
        - argocd-application-controller
        - --status-processors
        - "20"
        - --operation-processors
        - "50"
        - --repo-server-timeout-seconds
        - "420"
        - --app-resync
        - "600"

I also tried to scale up argocd-repo-server (5 replicas) and it did not work.

git clone inside argocd-repo-server is working fine.

Installed following ArgoCD versions and had the same issue for all of them:

  • 1.5.0
  • 1.6.1
  • 1.7.1

My cluster has 3 nodes and 1 master.

This is happening for manual and auto sync policies while creating a new app.

What if the git timeout is definitely caused by slowness not (directly) related to ArgoCD ?

To be more precise, what if the slowness is caused by rate limiting? Then argocd will start retrying all the calls that are rate-limited for more than 15 seconds, thus hammering the git server with more and more requests, thereby indirectly causing the rate limiting to increase.

It would certainly be very useful if it were possible to change that timeout setting, in that situation.

(PS: Sorry if this comes across emotionally, I just had a bad day trying to figure out why our production applications suddenly weren’t updating any more, and why azuregit was suddenly refusing to talk to argocd.)

Maybe it could help to somebody as well. We have ArgoCD on AKS cluster and we faced similar issue. According https://argo-cd.readthedocs.io/en/release-1.8/operator-manual/high_availability/ we have tried to set workaround: --repo-server-timeout-seconds for argocd-application-controller and issue disapiered for now.

In my case I found that all commands were stopped at 15s exactly, browsing the code I found this part here that would explain we stop synchronization after 15 seconds :

// Returns a HTTP client object suitable for go-git to use using the following
// pattern:
// - If insecure is true, always returns a client with certificate verification
//   turned off.
// - If one or more custom certificates are stored for the repository, returns
//   a client with those certificates in the list of root CAs used to verify
//   the server's certificate.
// - Otherwise (and on non-fatal errors), a default HTTP client is returned.
func GetRepoHTTPClient(repoURL string, insecure bool, creds Creds) *http.Client {
    // Default HTTP client
    var customHTTPClient = &http.Client{
        // 15 second timeout
        Timeout: 15 * time.Second,
        // don't follow redirect
        CheckRedirect: func(req *http.Request, via []*http.Request) error {
            return http.ErrUseLastResponse
        },
    }

Git client seems to not handle timeout specified in --repo-server-timeout-seconds so, even if the GRPC request is extended the git fetch is not. Should this option also be applied to git client ?

Regards,

References : https://github.com/argoproj/argo-cd/blob/e92e0fa4090a0c324b7a7dbf3d7a2e16ab281d85/util/git/client.go#L112

We’re experiencing the same timeout issues after deploying an ApplicationSet which creates 9 applications.

level=info msg="Normalized app spec: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2022-01-24T19:20:45Z\",\"message\":\"rpc error: code = Unknown desc = `helm dependency build` failed timeout after 1m30s\",\"type\":\"ComparisonError\"}]}}" application=app

ArgoCD version deployed: v2.2.2+03b17e0 ArgoCD ApplicationSet version: v0.3.0

Controller settings:

controller:
    args:
      # -- define the application controller `--status-processors`
      statusProcessors: "60"
      # -- define the application controller `--operation-processors`
      operationProcessors: "40"
      # -- define the application controller `--repo-server-timeout-seconds`
      repoServerTimeoutSeconds: "360"

GitHub API rate limits are 5000 API calls per hour, or ~83 calls per minute. I’m not knowledgeable enough about ArgoCD engine to say that we’re hitting such but limits but I think it could be very plausible that such limits are being reached as I’m able to create the Applications one by one!

UPDATE: I’ve also added appResyncPeriod together with the above configs and re-deployed. Being patient was key now! 🔑 After about 10minutes, none of my 9 applications where in an Unkown state anymore

#  - define the application controller `--app-resync`
      appResyncPeriod: "300"

UPDATE 2: At the seconds refresh, the repo-server started throwing timeouts again!

level=error msg="finished unary call with code Unknown" error="Manifest generation error (cached): `helm dependency build` failed timeout after 1m30s" grpc.code=Unknown grpc.method=GenerateManifest grpc.request.deadline="2022-01-25T14:18:48Z" grpc.service=repository.RepoServerService grpc.start_time="2022-01-25T14:12:48Z" grpc.time_ms=330.102 span.kind=server system=grpc