rancher: Failed to list *: context deadline exceeded seen in rancher/rancher container logs

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible): Run rancher/rancher:v2.4.5 or v2.4-head, create a custom cluster and add a node with all roles. Watch Rancher container logs.

Result: Following lines are seen:

Failed to list *v1.X: Get X: context deadline exceeded

gzrancher/rancher#11544

gzrancher/rancher#11329

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 11
  • Comments: 20 (7 by maintainers)

Most upvoted comments

Same issue here after upgrading to v2.4.5

This seems to be related to resourceVersion changes and the watcher/informer using an old/too high resourceVersion causing the logging. When I restart the kube-apiserver using docker restart kube-apiserver on the control plane nodes of the cluster (or possibly just the one which address is being logged in the https:// URL), it seems to stop.

Now figuring out how to reset/fix this.

This was introduced by https://github.com/rancher/norman/pull/367/

Hi @superseb ,

Not sure whether it is related (and hopefully this helps), but we were getting these errors too, and noticed on the affected control plane node containers running the version of the rancher-agent we upgraded from (v2.3.6, with image ID beginning with 697), instead of the v2.4.5 image

 $ docker images|grep agent
rancher/rancher-agent               v2.4.5              2e6c7ac4e072        4 weeks ago         294MB
rancher/rancher-agent               v2.3.6              697a883e05d0        3 months ago        282MB
 $ docker ps | grep 697
6006fc7b45b9        697a883e05d0                          "run.sh"                 2 hours ago         Up 2 hours                              k8s_cluster-register_cattle-cluster-agent-786cdbd4-xsgnq_cattle-system_83f41f93-c33e-43d6-8986-4a20bb1c71ce_0
e12fec09cb50        697a883e05d0                          "run.sh"                 17 hours ago        Up 17 hours                             k8s_agent_cattle-node-agent-hv4lz_cattle-system_455b6c01-2c8c-4479-8bca-c3c79f254590_0

Also, to clarify, share-mnt was already running on the v2.4.5 image by this stage.

Following your workaround to restart the apiserver, and we see the v2.4.5 image correctly being used:

 $ docker ps | grep 2e6
35c06385192a        2e6c7ac4e072                          "run.sh"                 18 minutes ago       Up 18 minutes                           k8s_agent_cattle-node-agent-6kjw8_cattle-system_f0c3e78f-7c20-49b6-950b-a29c5536548f_0

Encountered with Rancher running on a 1.15.2 cluster.

No impact has been reported, if you experience any, please report back. It seems to be related to 1.18 k8s go client vs older Kubernetes versions.