calico: Failed to "KillPodSandbox" due to calico connection is unauthorized
After some period, Pods cannot create and delete with this message
$ kubectl describe pod <name>
error killing pod: failed to "KillPodSandbox" for "9f91266a-70a9-428f-a1d6-a2ae8d5427d1" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"4657b77480472f4352e413d52e0c5d5545c675da862cc56c8e6f22d7b0577031\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
It seems to be relate with the service account of policy changed from kubernetes v1.26.0
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#manual-secret-management-for-serviceaccounts
Here is the workaround of solution. re-read calico-node information by restart or delete.
$ kubectl rollout restart ds -n kube-system calico-node
Expected Behavior
kubectl create or delete is working fine.
Current Behavior
It won’t work properly
[root@m-k8s ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
dpy-nginx-6564b9dbcc-d7jj5 0/1 ContainerCreating 0 17m
dpy-nginx-6564b9dbcc-vgjmw 0/1 ContainerCreating 0 17m
dpy-nginx-6564b9dbcc-wbr59 0/1 ContainerCreating 0 17m
nfs-client-provisioner-7596fb9c9c-gmpmn 0/1 Terminating 0 47h
nfs-client-provisioner-7596fb9c9c-jvmnm 1/1 Running 1 (46m ago) 42h
nginx-76d9fbf4fb-7xjgb 0/1 Terminating 0 42h
nginx-76d9fbf4fb-dv48n 1/1 Running 0 42h
nginx-76d9fbf4fb-kqp5j 1/1 Running 0 42h
nginx-76d9fbf4fb-qrl4p 1/1 Running 0 42h
nginx-76d9fbf4fb-wlpwd 1/1 Running 0 42h
Possible Solution
`Workaround’ is restart daemonset or delete pod.
OR
‘Possible Solution’ is that create a long period secret token for service account instead of this. and use this secret with service account for calico-node. (it is related with #5712 #6421)
sh-4.4# cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IjlpTFk5RXlJR29yb01VZjlXOGg0UGhvLWhLRGhtZnNvekdyeU0xdVlFUTAifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzA1OTc1ODA5LCJpYXQiOjE2NzQ0Mzk4MDksImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInBvZCI6eyJuYW1lIjoiY2FsaWNvLW5vZGUtOWRnZzIiLCJ1aWQiOiIxY2UwODRlYS1kNzIzLTQ5MDAtYjI1ZC00YzRhNTVmMmI0OWYifSwic2VydmljZWFjY291bnQiOnsibmFtZSI6ImNhbGljby1ub2RlIiwidWlkIjoiM2RhYmI5MmYtN2UzYy00ZTkyLWI4OTUtZmM3NzczM2RlMTBmIn0sIndhcm5hZnRlciI6MTY3NDQ0MzQxNn0sIm5iZiI6MTY3NDQzOTgwOSwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmNhbGljby1ub2RlIn0.SC5WdggKDD-SE2ZnIfNYaMROXNvJVqqdKXdF6SCN_qrLBwmLwXbSHnQA_vkBBFHqi1qsQP2CuBx0beYUzm5VkcBt7LMZeDBHaOfDIfBvwMbzkAAMcSoqd6bnZi1mZa8Mf2ZTVEvhLOJSyb9npGAa0te6xfWAvEbTmGWTOvZaQ59y-RqJ9OfqAiYYWoEDCLpjjjG0F1-ke2_6eRx7m6Ri2Ne47WKGGURfMVvf2GAtV0xrYuI2tvA8UhivzhaPiJx56RfyVmVAnrl8qfBk0rG6J43TkPGA59R52vbvJkI_9k-kPw_OXJv35YDqgExn3i7CswGUZCX9TAGkET5mpm7u4w
Steps to Reproduce (for bugs)
- Deploy native-kubernetes by vagrant-script (link)
- Wait for 1-2days
- Deploy new deployment
[root@m-k8s ~]# k create deploy new-nginx --image=nginx --replicas=3
deployment.apps/new-nginx created
- Check deployment status
[root@m-k8s ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
new-nginx-6564b9dbcc-<hash> 0/1 ContainerCreating 0 15m
new-nginx-6564b9dbcc-<hash> 0/1 ContainerCreating 0 15m
new-nginx-6564b9dbcc-<hash> 0/1 ContainerCreating 0 15m
Context
It already applied to the code from #6218
node/pkg/cni/token_watch.go
const defaultCNITokenValiditySeconds = 24 * 60 * 60
const minTokenRetryDuration = 5 * time.Second
const defaultRefreshFraction = 4
func NewTokenRefresher(clientset *kubernetes.Clientset, namespace string, serviceAccountName string) *TokenRefresher {
return NewTokenRefresherWithCustomTiming(clientset, namespace, serviceAccountName, defaultCNITokenValiditySeconds, minTokenRetryDuration, defaultRefreshFraction)
}
So I decoded applied JWT on the calico-node.
It confirmed 1 year(365d) properly.
JWT
sh-4.4# cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IjlpTFk5RXlJR29yb01VZjlXOGg0UGhvLWhLRGhtZnNvekdyeU0xdVlFUTAifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzA1OTc1ODA5LCJpYXQiOjE2NzQ0Mzk4MDksImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInBvZCI6eyJuYW1lIjoiY2FsaWNvLW5vZGUtOWRnZzIiLCJ1aWQiOiIxY2UwODRlYS1kNzIzLTQ5MDAtYjI1ZC00YzRhNTVmMmI0OWYifSwic2VydmljZWFjY291bnQiOnsibmFtZSI6ImNhbGljby1ub2RlIiwidWlkIjoiM2RhYmI5MmYtN2UzYy00ZTkyLWI4OTUtZmM3NzczM2RlMTBmIn0sIndhcm5hZnRlciI6MTY3NDQ0MzQxNn0sIm5iZiI6MTY3NDQzOTgwOSwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmNhbGljby1ub2RlIn0.SC5WdggKDD-SE2ZnIfNYaMROXNvJVqqdKXdF6SCN_qrLBwmLwXbSHnQA_vkBBFHqi1qsQP2CuBx0beYUzm5VkcBt7LMZeDBHaOfDIfBvwMbzkAAMcSoqd6bnZi1mZa8Mf2ZTVEvhLOJSyb9npGAa0te6xfWAvEbTmGWTOvZaQ59y-RqJ9OfqAiYYWoEDCLpjjjG0F1-ke2_6eRx7m6Ri2Ne47WKGGURfMVvf2GAtV0xrYuI2tvA8UhivzhaPiJx56RfyVmVAnrl8qfBk0rG6J43TkPGA59R52vbvJkI_9k-kPw_OXJv35YDqgExn3i7CswGUZCX9TAGkET5mpm7u4w
Decoded JWT's Payload
{
"aud": [
"https://kubernetes.default.svc.cluster.local"
],
"exp": 1705975809, <<<< Tue Jan 23 2024 02:10:09 GMT+0000
"iat": 1674439809,
"iss": "https://kubernetes.default.svc.cluster.local",
"kubernetes.io": {
"namespace": "kube-system",
"pod": {
"name": "calico-node-9dgg2",
"uid": "1ce084ea-d723-4900-b25d-4c4a55f2b49f"
},
"serviceaccount": {
"name": "calico-node",
"uid": "3dabb92f-7e3c-4e92-b895-fc77733de10f"
},
"warnafter": 1674443416
},
"nbf": 1674439809,
"sub": "system:serviceaccount:kube-system:calico-node"
}
Thus this issue is a little different logic to verify the authorization from kubernetes.
/var/log/message from all nodes like below when it happened.
[control-plane node]
Jan 23 09:10:35 m-k8s kubelet: E0123 09:10:35.298683 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:10:50 m-k8s kubelet: E0123 09:10:50.303499 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:11:05 m-k8s kubelet: E0123 09:11:05.308058 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:11:20 m-k8s kubelet: E0123 09:11:20.300704 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:11:35 m-k8s kubelet: E0123 09:11:35.290727 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
<snipped>
[worker node]
Jan 21 16:44:12 w2-k8s kubelet: E0121 16:44:12.656423 3630 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 21 16:44:27 w2-k8s kubelet: E0121 16:44:27.650877 3630 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Your Environment
- Calico version: v3.24.5, v3.25.0
- Orchestrator version (e.g. kubernetes, mesos, rkt): native-kubernetes v1.26.0
[root@m-k8s ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m-k8s Ready control-plane 2d19h v1.26.0 192.168.1.10 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
w1-k8s Ready <none> 2d19h v1.26.0 192.168.1.101 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
w2-k8s Ready <none> 2d19h v1.26.0 192.168.1.102 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
w3-k8s Ready <none> 2d18h v1.26.0 192.168.1.103 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
- Operating System and version: CentOS 7.9 (3.10.0-1127.19.1.el7.x86_64)
- Link to your project (optional):
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 7
- Comments: 26 (5 by maintainers)
Plus workaround is effective.
We were able to fix this problem now. Our master node’s had an incorrect version which ruined everything. An update of our master node was fortunately the solution without any hacky workarounds. But thanks for the help - I appreciate that!
Last checkk8s v1.27.2 + calico_v3.26.0 = Looking good after AGE 5DFYI k8s v1.27.2 + calico_v3.26.0 = Looking good after AGE 42H
I have the same problem, if u think that’s clear in master, maybe the problem can solving at another node or worker.
Facing same issue.
We refrain using workaround so any updates on how to get rid of this issue? How can we tackle service account policy changes in kubernetes v1.26 mentioned in this issue description?
I’m using - k8s v1.26.1 + calico_v3.25.0 + containerd 1.6.18
same behaviour
cluster info
@coutinhop Oh…? I am so sorry, it does’t mean to upload without any comment. (My cat push the button? something…? anyhow OMG) Thus I updated all I know so far. The trigger or reproducing procedure is not clear yet. Therefore I will clarify for duplicated protocol as soon.
Thank you for letting me know empty issue that I upload.