rancher: Rancher server crashed with error - leaderelection lost for cattle-controllers
Rancher server version - Build from master
Rancher server crashed when cluster provisioning related scenarios were attempted:
2018/05/11 21:26:28 [INFO] Handling backend connection request [m-sxsdg]
E0511 21:26:31.813321 1 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0511 21:26:32.072977 1 reflector.go:315] github.com/rancher/rancher/vendor/github.com/rancher/norman/controller/generic_controller.go:129: Failed to watch *v1.Node: Get https://172.31.4.120:6443/api/v1/watch/nodes?resourceVersion=2547&timeoutSeconds=552: tunnel disconnect
E0511 21:26:32.081724 1 reflector.go:315] github.com/rancher/rancher/vendor/github.com/rancher/norman/controller/generic_controller.go:129: Failed to watch *v1.Secret: Get https://172.31.4.120:6443/api/v1/watch/secrets?resourceVersion=2101&timeoutSeconds=464: tunnel disconnect
E0511 21:26:32.414349 1 reflector.go:315] github.com/rancher/rancher/vendor/github.com/rancher/norman/controller/generic_controller.go:129: Failed to watch *v1.Secret: Get https://13.59.193.167:6443/api/v1/watch/secrets?resourceVersion=672&timeoutSeconds=525: waiting for cluster agent to connect
2018/05/11 21:26:34 [INFO] Handling backend connection request [m-ba845fb607be]
2018/05/11 21:26:34 [INFO] Handling backend connection request [m-d981f39d06e3]
2018/05/11 21:26:34 [INFO] Handling backend connection request [m-qn2rx]
2018/05/11 21:26:34 [INFO] Handling backend connection request [m-gn7lm]
2018/05/11 21:26:34 [INFO] Handling backend connection request [m-8d5a15be1dce]
2018/05/11 21:26:34 [INFO] Handling backend connection request [m-w7v62]
2018/05/11 21:26:26 [INFO] Handling backend connection request [m-f4717e1c7a42]
2018/05/11 21:26:35 [INFO] stdout: (test-17704) Waiting for IP address to be assigned to the Droplet...
2018/05/11 21:26:36 [INFO] Handling backend connection request [c-mg24m]
2018/05/11 21:26:36 [INFO] Handling backend connection request [c-9xpvw]
2018/05/11 21:26:36 [INFO] Handling backend connection request [c-xzspz]
2018/05/11 21:26:31 [INFO] Handling backend connection request [m-b9njt]
2018/05/11 21:26:39 [ERROR] netpolMgr: program: error updating network policy err=Put https://138.197.108.114:6443/apis/networking.k8s.io/v1/namespaces/default/networkpolicies/hn-nodes: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0511 21:26:32.443595 1 reflector.go:315] github.com/rancher/rancher/vendor/github.com/rancher/norman/controller/generic_controller.go:129: Failed to watch *v1.ClusterRole: Get https://13.59.193.167:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterroles?resourceVersion=665&timeoutSeconds=406: waiting for cluster agent to connect
E0511 21:26:35.397525 1 writers.go:139] apiserver was unable to write a JSON response: http: Handler timeout
E0511 21:26:27.416041 1 event.go:260] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' '402bb98070f9 stopped leading'
E0511 21:26:40.190817 1 writers.go:139] apiserver was unable to write a JSON response: http: Handler timeout
E0511 21:26:40.966149 1 runtime.go:66] Observed a panic: &errors.errorString{s:"kill connection/stream"} (kill connection/stream)
/go/src/github.com/rancher/rancher/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/rancher/rancher/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/rancher/rancher/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/go/src/github.com/rancher/rancher/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:230
/go/src/github.com/rancher/rancher/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:114
/go/src/github.com/rancher/rancher/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:45
/usr/local/go/src/net/http/server.go:1918
/go/src/github.com/rancher/rancher/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:110
/usr/local/go/src/net/http/server.go:1918
/go/src/github.com/rancher/rancher/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:41
/usr/local/go/src/net/http/server.go:1918
/go/src/github.com/rancher/rancher/vendor/k8s.io/apiserver/pkg/server/handler.go:198
/usr/local/go/src/net/http/server.go:2619
/usr/local/go/src/net/http/server.go:1801
/usr/local/go/src/runtime/asm_amd64.s:2337
E0511 21:26:41.232679 1 cronjob_controller.go:113] can't list Jobs: the server was unable to return a response in the time allotted, but may still be processing the request (get jobs.batch)
2018/05/11 21:26:41 [ERROR] netpolMgr: handleHostNetwork: error programming hostNetwork network policy for ns=default err=Put https://138.197.108.114:6443/apis/networking.k8s.io/v1/namespaces/default/networkpolicies/hn-nodes: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/05/11 21:26:41 [FATAL] leaderelection lost for cattle-controllers
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 21 (4 by maintainers)
I can verify the same behaviour with Rancher 2.1.1 is just annoying. This issue should be reopened.
First, sorry for only (semi-raging) about this. But expanding on details, I am seeing this behaviour on OVH Public Cloud, with recommended Ubuntu 16.04.5 LTS, docker 17.03.2-ce and kernel 4.15.0-39-generic.
The crashing message seems to be different each time, but always of the type “Could not construct reference to …”
This happens, in my case (but I see similar log lines in other’s logs) after a series of etcd high update latencies, and other related timeouts communicating with some internal microservice. Ex:
So it seems mainly, some high latencies with etcd (26s to update 1 record???). I saw some similar messages from other users, but never that high! Is that normal? And the other (possibly related) issue is that the microservice served at port 6443 lags way too much and causes the timeouts of all the other components. So much that crashes the main process? (This should never happen, c’mon Go…)
I hope this findings to be useful for someone, and I am willing to help, but I am a Kubernetes newbie and I do not know so much about etcd, expected behaviours and so on…
I am also experiencing the same issue from
2.0.6to2.0.8, it works for about ~15 to 30 minutes and then restart with this error.