rancher: Failed to communicate with API server

rancher/rancher:master 08/29

image

Steps to reproduce:

  1. Create a cluster in Rancher. (custom, AWS t2.medium, ubuntu, 1 node with all roles)
  2. Deploy a workload with image liuyan/producelog:v3 (The image will print a lot of logs to console log)
  3. View log of the pod and wait for a while (like 10s)
  4. Close the view log popup.
  5. View log again and wait for 10s.

Result:

Cluster will be in Unavailable state and it will never become active again.

Failed to communicate with API server: Get https://172.31.16.104:6443/api/v1/componentstatuses?timeout=30s: waiting for cluster agent to connect

Workaround:

ssh to the node and remove k8s_agent_cattle-node-agent-56ng5_cattle-system_8ad2b841-ac28-11e8-b49b-063c344537ce_0 manually. The pod will get recreated automatically and the cluster will become active state.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 7
  • Comments: 35 (2 by maintainers)

Most upvoted comments

have this issue after upgrade to v2.5.2 from 2.4.6

It is still there after 2.5.5

2.1.5 exhibits this as well. Every time I remove 1 node (in a 3 or 4 node cluster) this does this for quite awhile. SSH’ing to to one of the active nodes tells me that the box is ok / healthy but the API response says otherwise.

I am getting the same issue with connection beetwen nodes in cluster. I have 2.1.5 Rancher installed.

2.2.3 same error after upgrade

2.1.5 exhibits this as well. Every time I remove 1 node (in a 3 or 4 node cluster) this does this for quite awhile. SSH’ing to to one of the active nodes tells me that the box is ok / healthy but the API response says otherwise.

It happens since v2.0.7.

It looks good in v2.0.4, v2.0.5 and v2.0.6