rancher: Rancher server crash with "lost master" error

The same issue as https://github.com/rancher/rancher/issues/18505 in v2.2.2-rc10 and v2.2.2-rc11 setup: single install, 4 vCPU, 8GB memory It happens after I run the automation in an RKE cluster, and the logs show lost master

I0412 00:40:36.159827       7 trace.go:76] Trace[1294926626]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-04-12 00:40:26.580904568 +0000 UTC m=+1496.029070405) (total time: 9.578891662s):
Trace[1294926626]: [9.578814204s] [9.57867164s] About to write a response
I0412 00:40:36.580516       7 leaderelection.go:231] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0412 00:40:36.580567       7 server.go:207] lost master
lost lease
2019/04/12 00:40:51 [INFO] Listening on /tmp/log.sock
2019/04/12 00:40:51 [INFO] Rancher version v2.2.2-rc11 is starting

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 7
Comments: 15 (1 by maintainers)

Most upvoted comments

I restarted the whole cluster and rancher come up again.

ribx on Jul 8, 2019

Seeing this as well on v2.2.2, single master install:

2019-04-29 02:13:43.443529 W | etcdserver: apply entries took too long [2.458767663s for 1 entries]
2019-04-29 02:13:43.443917 W | etcdserver: avoid queries with large range/delete range!
I0429 02:13:47.149286       6 trace.go:76] Trace[1158079405]: "List /apis/management.cattle.io/v3/namespaces/p-28tr9/projectloggings" (started: 2019-04-29 02:13:37.457225975 +0000 UTC m=+3631.790283549) (total time: 9.688945013s):
Trace[1158079405]: [2.612163046s] [2.612163046s] About to List from storage
Trace[1158079405]: [7.486704661s] [4.874541615s] Listing from storage done
Trace[1158079405]: [9.688940071s] [2.201644078s] Writing http response done (0 items)
2019/04/29 02:13:48 [INFO] 2019/04/29 02:13:48 http: response.Write on hijacked connection
2019/04/29 02:13:48 [INFO] 2019/04/29 02:13:48 http: response.Write on hijacked connection
2019/04/29 02:13:47 [INFO] Updating global catalog helm
I0429 02:13:52.161686       6 leaderelection.go:231] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0429 02:13:52.163380       6 server.go:207] lost master
lost lease

chendo on Apr 29, 2019