rancher: Failed to communicate with API server
rancher/rancher:master 08/29
Steps to reproduce:
- Create a cluster in Rancher. (custom, AWS t2.medium, ubuntu, 1 node with all roles)
- Deploy a workload with image
liuyan/producelog:v3
(The image will print a lot of logs to console log) - View log of the pod and wait for a while (like 10s)
- Close the view log popup.
- View log again and wait for 10s.
Result:
Cluster will be in Unavailable state and it will never become active again.
Failed to communicate with API server: Get https://172.31.16.104:6443/api/v1/componentstatuses?timeout=30s: waiting for cluster agent to connect
Workaround:
ssh to the node and remove k8s_agent_cattle-node-agent-56ng5_cattle-system_8ad2b841-ac28-11e8-b49b-063c344537ce_0
manually. The pod will get recreated automatically and the cluster will become active
state.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 35 (2 by maintainers)
have this issue after upgrade to v2.5.2 from 2.4.6
It is still there after 2.5.5
I am getting the same issue with connection beetwen nodes in cluster. I have 2.1.5 Rancher installed.
2.2.3 same error after upgrade
2.1.5 exhibits this as well. Every time I remove 1 node (in a 3 or 4 node cluster) this does this for quite awhile. SSH’ing to to one of the active nodes tells me that the box is ok / healthy but the API response says otherwise.
It happens since v2.0.7.
It looks good in v2.0.4, v2.0.5 and v2.0.6