calico: CNI plugin: error getting ClusterInformation: connection is unauthorized: Unauthorized
K8S & Calico information
HostOS: RHEL 8.2 K8S: on-premise cluster; version is v1.21.1; “IPVS” mode; IP4/IP6 dual stack; installed using kubespray Calico: version is v3.18.4; non-BGP mode; enabled “IP6” DNAT. Our docker image is built on top of “RHEL ubi:8” We do not setup external ETCD cluster.
“kubectl describe” output
[support@node-cont-1-qa conf]$ kubectl describe pod export-job-job-dp8hb
Name: export-job-job-dp8hb
Namespace: pio
Priority: 0
Node: node-df1-1/10.0.156.180
Start Time: Wed, 23 Feb 2022 05:57:18 -0800
Labels: app.kubernetes.io/instance=export-job-job
controller-uid=5d9f3e4b-e74c-4280-a3be-e31d37e92b84
job-name=export-job-job
Annotations: cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Pending
IP:
IPs: <none>
Controlled By: Job/export-job-job
Containers:
export-job-job:
Container ID:
Image: 10.0.156.250:5000/img-admf:9.3.0.0B038
Image ID:
Port: <none>
Host Port: <none>
Command:
csh
Args:
-c
source /TT9/configXcp.sh; lis_conf; python2 /etc/pio/APPL/XcdbBackup.py --exportdb --dir /var/tmp; sleep 300
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 512Mi
Requests:
cpu: 200m
memory: 256Mi
Environment: <none>
Mounts:
/TT9/PIO/9.0.0/RUN/config/APPL/DBConMgr.cnfg from db-conf (rw,path="DBConMgr.cnfg")
/TT9/PIO/9.0.0/RUN/config/feature_conf.json from feature-conf (rw,path="feature_conf.json")
/TT9/PIO/9.0.0/RUN/license/license.json from license-conf (rw,path="license.json")
/etc/pio/APPL/XcdbBackup.py from job-script (rw,path="XcdbBackup.py")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jh7lg (ro)
/var/tmp from external-pv (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
job-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: export-job-script
Optional: false
db-conf:
Type: Secret (a volume populated by a Secret)
SecretName: db-secret
Optional: false
feature-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: feature
Optional: false
license-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: license
Optional: false
external-pv:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: backup-pvc
ReadOnly: false
kube-api-access-jh7lg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52m default-scheduler Successfully assigned pio/export-job-job-dp8hb to node-df1-1
Warning FailedCreatePodSandBox 52m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "e46d8d9df11ef97e7e1d8b38ced7efef32e1cb4bfb0aa85809cb3198464b6167" network for pod "export-job-job-dp8hb": networkPlugin cni failed to set up pod "export-job-job-dp8hb_pio" network: connection is unauthorized: Unauthorized, failed to clean up sandbox container "e46d8d9df11ef97e7e1d8b38ced7efef32e1cb4bfb0aa85809cb3198464b6167" network for pod "export-job-job-dp8hb": networkPlugin cni failed to teardown pod "export-job-job-dp8hb_pio" network: error getting ClusterInformation: connection is unauthorized: Unauthorized]
Normal SandboxChanged 50m (x10 over 52m) kubelet Pod sandbox changed, it will be killed and re-created.
Expected Behavior
Should start POD successfully
Steps to Reproduce
Sorry, the issue happened two times on different K8S cluster in our lab. And I did not keep any logs… Myself want to know to reproduce too.
My initial thought(maybe wrong)
Since “kubectl describe” has “connection is unauthorized”, I searched source code of K8S v1.21.1. K8S code does NOT has it. Then search it in Calico v3.22 (I am using V3.18.4, but there is not be big difference), find that “connection is unauthorized” exist in “libcalico-go/lib/erros/errors.go” . So, looks like the issue is caused by Calico. Then, use “error getting ClusterInformation” as keyword to search in K8S code but cannot find. And search in Calico code, can find it. So, I have confidence to say the issue is 100% related with Calico.
Because “connection is unauthorized” error prompt is related with “type ErrorConnectionUnauthorized struct”, and "ErrorConnectionUnauthorized " is related with cooperation with ETCD, looks like that the issue is communication issue between Calico and ETCD.
By the way, /var/log/calico/cni/ does NOT has anything related with “etcd” during POD start/destroy while I did normal operation.
What I expect:
If possible, can you please tell me 1). Which webpage describes control/data flow between Calico and ETCD 2). log files and location that whole Calico uses 3). Did I miss any debug information
Thanks
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 29 (6 by maintainers)
I just had to
kubectl delete pod calico-node-xxxxon the node where the issue was happening. A new Pod was created and the problem is solved.This just happened to me in an older
v1.22.3cluster, and I’ve noticed that thecalico-nodepods had an age of 365d. The problem self-resolved after I deleted allcalico-nodepods and they were recreated. Is there a certificate / token that has a TTL of 1 year and doesn’t get automatically renewed?kubectl delete pod calico-node-xxxx -n kube-system , A new Pod was created and the problem is solved.
kubectl delete pods --all --all-namespacesfixed my issueI had a similar issue today and all the pods on my cluster were stuck in Unknown or Terminating status, including the calico-node-xxxx. I ran
kubectl delete pod calico-node-xxxxwhich fixed the calico-node pod, but the other pods were still not ok, so I rankubectl delete pods --all --all-namespacesto delete ALL the pods and a couple of minutes after the command everything was back up and running well!clusterVersion: v1.23
Not sure how useful my comment would be, but I encountered this error when i accidentally rebooted one of the nodes in the cluster. The full error is as follows:
error killing pod: failed to "KillPodSandbox" for "%some-guid%" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"%some-pod-id%\" network: error getting ClusterInformation: connection is unauthorized: Unauthorized"The killing was triggered due to disk pressure event being triggered on the node, reasons of which I’m no entirely sure. Lowered imageGC thresholds a bit before, but from my understanding they shouldn’t trigger disk pressure. Maybe I’m wrong.
ps: I also recall a similar situation with an api that constnatly got evicted every couple of days (disk pressure) and it’s evicted pods were never cleaned up. Didn’t really look up into why the pods remained, but maybe they also were supposed to be cleaned up, but never did because of this error.
Run into a similar issue and worked around by NTP synchronization 😃
Encountering the same issue, how to solve it?
Examining
coredns-6d4b75cb6d-cxgj8will give me this error:Please help
I had slightly different issue, but restarting calico pod on the node with failed pod and then the failed pod helped. Pod moved to another node after restart. MicroK8s v1.26.0 revision 4390, Calico v3.23.5
did you fix this
Same issue in 1.22 with Calico
Events: Type Reason Age From Message
Normal Scheduled 3m47s default-scheduler Successfully assigned default/pod-with-cm to worker-node01 Warning FailedCreatePodSandBox 3m46s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container “3dcfdb21462e255a8f4059ca8540c8df05863bd6444cb22290133f894840845e” network for pod “pod-with-cm”: networkPlugin cni failed to set up pod “pod-with-cm_default” network: error getting ClusterInformation: connection is unauthorized: Unauthorized, failed to clean up sandbox container “3dcfdb21462e255a8f4059ca8540c8df05863bd6444cb22290133f894840845e” network for pod “pod-with-cm”: networkPlugin cni failed to teardown pod “pod-with-cm_default” network: error getting ClusterInformation: connection is unauthorized: Unauthorized]
hit the same issue, and lbogdan’s workaround fixed it for me.