argo-cd: Error log on missing cache key

Hello

I getting this error log sometimes:

[argo-cd-argocd-server-bf74d748b-2xmgb] time="2020-12-16T03:45:46Z" level=error msg="finished streaming call with code Unknown" error="cache: key is missing" grpc.code=Unknown grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time="2020-12-16T03:45:22Z" grpc.time_ms=24038.105 span.kind=server system=grpc

I can see that it is marked as level error In my opinion missing key cache could be warning level instead

What do you think ?

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 4
Comments: 27 (9 by maintainers)

Most upvoted comments

This is what happens when a user clicks on an object that should take them to a detail page.

+13

lzm0 on Jul 12, 2021

We’re continuing to run into this as well periodically, but restarting the argocd-application-controller pod does not appear to resolve it for us.

rayterrill on Jan 6, 2022

I tried 2.5.0 and I still keep getting this error. Any recommendation on how to fix it permanently?

ThomasVitale on Nov 1, 2022

Note: for us, this meant we couldn’t see the resources for the application. This wasn’t some trivial thing. Eventually the problem went away, but it took a long time.

jsoref on Apr 9, 2021

I’m actually seeing this error getting logged as fatal using argocd 1.8.2: level=fatal msg="rpc error: code = Unknown desc = cache: key is missing"

aweis89 on Feb 2, 2021

@james-callahan that is correct. Controller is trying to minimize number of writes to redis and don’t write the same message twice. Logic two skip second write is implemented here: https://github.com/argoproj/argo-cd/blob/a08282bf6bcd7b44ea2dd3ef7fa0e2a77498063e/util/cache/twolevelclient.go#L24

alexmt on May 12, 2023

If “empty secret” is something affecting this (in newer versions of ArgoCD), here’s an example of how I’ve found such secrets (but as you can see here; there are cases where we don’t create them and might not have an easy way of just “removing them to get ArgoCD to work as it should”):

$ kubectl get secrets -A -o json | jq '.items[] | select(has("data") == false)'
{
  "apiVersion": "v1",
  "kind": "Secret",
  "metadata": {
    "creationTimestamp": "2021-06-30T15:06:11Z",
    "labels": {
      "managed-by": "prometheus-operator"
    },
    "name": "alertmanager-monitoring-prometheus-oper-alertmanager-tls-assets",
    "namespace": "monitoring",
    "ownerReferences": [
      {
        "apiVersion": "monitoring.coreos.com/v1",
        "blockOwnerDeletion": true,
        "controller": true,
        "kind": "Alertmanager",
        "name": "monitoring-prometheus-oper-alertmanager",
        "uid": "bd94541f-20a1-4625-bd4e-53262ea30f3a"
      }
    ],
    "resourceVersion": "34611341",
    "uid": "4727ce81-ed44-454e-9272-16a02acc868a"
  },
  "type": "Opaque"
}

MPV on May 16, 2022

Experiencing the same issue.

How to reproduce:

Navigate to application detail page in web UI.
Restart redis pod (non-HA setup)
Click on any app resource
Message mentioned in https://github.com/argoproj/argo-cd/issues/5068#issuecomment-878594783 is displayed.

Refresh of the page doesn’t help. The issue eventually disappears but not in minutes, rather in tens of minutes. It can be resolved immediately restarting the argocd-application-controller pod.

@alexmt As you closed https://github.com/argoproj/argo-cd/issues/6009 as duplicate I guess this should be marked as bug not enhancement. I think there is an issue reconnecting to redis from the controller when the connection is lost.

jizi on Dec 9, 2021

Ran into this today. For some reason, in the ha-setup, the ha-proxy health check was only executing health checks against the svc for the server R0 and R1 for all checks, the R2 was skipped in all cases.

I had to cause a manual sentinel failover:

/usr/local/etc/haproxy $ nc argocd-redis-ha-announce-0 26379
SENTINEL failover argocd

I ran this from the ha proxy box because I was checking connectivity anyways. After sentinel selected and promoted a new redis master, the checks against the R2 server started to work again, and this issue went away entirely.

justin-watkinson-sp on Nov 23, 2021

I’m new to argocd, I started with 2.7.8, and it worked well for ten days, then I upgraded it to 2.8, and now I encountered the same error after upgrading it for one day. Not sure if it is related.

I downgraded to 2.7.8, and it is back to normal now.

yuezk on Aug 12, 2023

@alexmt: Are you opposed to just removing that line?

And in the interim, can the description be updated to suggest shooting the application controller?

jsoref on Jun 13, 2023

I got this today when I restarted redis. Doing a hard refresh of my application(s) didn’t help. Restarting the application controller fixed it.

I assume the issue I hit is that the application controller checked something in redis and is assuming that going forward it remains in redis.

james-callahan on May 12, 2023

Hi @alexmt

In the documentation I could see this

Argo CD is largely stateless, 
all data is persisted as Kubernetes objects, which in turn is stored in Kubernetes' etcd. 
Redis is only used as a throw-away cache and can be lost.
When lost, it will be rebuilt without loss of service.

The system keeps working alright when the cache is missing, IMO printing error level is when the system is not functioning properly and causes the system to unexpected behaviors . Printing warning is sounds better suitable

kfirfer on Dec 16, 2020