rancher: Failed to communicate with API server

rancher/rancher:master 08/29

Steps to reproduce:

Create a cluster in Rancher. (custom, AWS t2.medium, ubuntu, 1 node with all roles)
Deploy a workload with image liuyan/producelog:v3 (The image will print a lot of logs to console log)
View log of the pod and wait for a while (like 10s)
Close the view log popup.
View log again and wait for 10s.

Result:

Cluster will be in Unavailable state and it will never become active again.

Failed to communicate with API server: Get https://172.31.16.104:6443/api/v1/componentstatuses?timeout=30s: waiting for cluster agent to connect

Workaround:

ssh to the node and remove k8s_agent_cattle-node-agent-56ng5_cattle-system_8ad2b841-ac28-11e8-b49b-063c344537ce_0 manually. The pod will get recreated automatically and the cluster will become active state.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 7
Comments: 35 (2 by maintainers)

Most upvoted comments

have this issue after upgrade to v2.5.2 from 2.4.6

It is still there after 2.5.5

maxisam on Jan 18, 2021

2.1.5 exhibits this as well. Every time I remove 1 node (in a 3 or 4 node cluster) this does this for quite awhile. SSH’ing to to one of the active nodes tells me that the box is ok / healthy but the API response says otherwise.

I am getting the same issue with connection beetwen nodes in cluster. I have 2.1.5 Rancher installed.

ppotaki on Feb 7, 2019

2.2.3 same error after upgrade

Panthro on May 21, 2019

2.1.5 exhibits this as well. Every time I remove 1 node (in a 3 or 4 node cluster) this does this for quite awhile. SSH’ing to to one of the active nodes tells me that the box is ok / healthy but the API response says otherwise.

TechCanuck on Feb 6, 2019

It happens since v2.0.7.

It looks good in v2.0.4, v2.0.5 and v2.0.6

loganhz on Aug 30, 2018