longhorn: [BUG] Constant errors about io.rancher.longhorn-reg.sock
Describe the bug I have several errors in kubelet log of every node, and it’s the same that continues every X minutes:
I0527 14:47:10.544314 10814 reconciler.go:156] operationExecutor.RegisterPlugin started for plugin at "/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock" (plugin details: &{/var/lib/kubelet/plugins/io.rancher.longhorn-reg.socktrue 2020-03-21 22:46:57.40373808 +0000 UTC m=+21.619442778})
I0527 14:47:10.544409 10814 operation_generator.go:193] parsed scheme: ""
I0527 14:47:10.544427 10814 operation_generator.go:193] scheme "" not registered, fallback to default scheme
I0527 14:47:10.544452 10814 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock 0 <nil>}] <nil>}
I0527 14:47:10.544478 10814 clientconn.go:577] ClientConn switching balancer to "pick_first"
I0527 14:47:10.544581 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, CONNECTING
W0527 14:47:10.544781 10814 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock: connect: connection refused". Reconnecting...
I0527 14:47:10.544846 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, TRANSIENT_FAILURE
I0527 14:47:11.545032 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, CONNECTING
W0527 14:47:11.545080 10814 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock: connect: connection refused". Reconnecting...
I0527 14:47:11.545120 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, TRANSIENT_FAILURE
I0527 14:47:13.214754 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, CONNECTING
W0527 14:47:13.214799 10814 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock: connect: connection refused". Reconnecting...
I0527 14:47:13.214866 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, TRANSIENT_FAILURE
I0527 14:47:15.865668 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, CONNECTING
W0527 14:47:15.865765 10814 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock: connect: connection refused". Reconnecting...
I0527 14:47:15.865828 10814 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc001438790, TRANSIENT_FAILURE
W0527 14:47:17.528690 10814 kubelet_pods.go:849] Unable to retrieve pull secret longhorn-system/ for longhorn-system/longhorn-csi-plugin-9mrj8 due to secret "" not found. The image pull may not succeed.
E0527 14:47:20.544545 10814 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock" failed. No retries permitted until 2020-05-27 14:49:22.544513867 +0000 UTC m=+5760166.760218627 (durationBeforeRetry 2m2s). Error: "RegisterPlugin error -- dial failed at socket /var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock, err: failed to dial socket /var/lib/kubelet/plugins/io.rancher.longhorn-reg.sock, err: context deadline exceeded"
To Reproduce I have two clusters but this error happens only on one of them. It’s a RKE cluster, on 3 bare metal servers. The error is logged in all nodes. I already tried to remove longhorn with also manual remove of CRDs but the error PERSISTS also after remove longhorn. So I tried to install again (last version) and the error continues.
Expected behavior No errors in kubelet logs.
Environment:
- Longhorn version: 0.8.1 (the same with 0.7.0)
- Kubernetes version: 1.16.6
- Node OS type and version: Ubuntu 18.04
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (16 by maintainers)
Followed the instructions from https://longhorn.io/docs/1.0.0/deploy/upgrade/longhorn-manager/#cleanup-for-compatible-csi-plugin, the logs of error message related to
io.rancher.longhorn-reg.disappear.To repro the issue, tried steps from https://github.com/longhorn/longhorn/issues/1412#issuecomment-634927383
@Leen15 , thanks for your confirm, and good to hear the issue is gone for you.