rancher: error in remotedialer server

What kind of request is this (question/bug/enhancement/feature request): question

Steps to reproduce (least amount of steps as possible): I have just added a new custom node from a vm identical in everything to the other nodes,

Result:

2019-01-25T15:44:42.316560829Z 2019/01/25 15:44:42 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->192.168.1.32:53986: i/o timeout 2019-01-25T15:44:43.737401840Z 2019-01-25 15:44:43.737223 W | etcdserver: apply entries took too long [133.466929ms for 1 entries] 2019-01-25T15:44:43.737445277Z 2019-01-25 15:44:43.737256 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:04.213325099Z 2019-01-25 15:45:04.213148 W | etcdserver: apply entries took too long [170.190814ms for 1 entries] 2019-01-25T15:45:04.213379786Z 2019-01-25 15:45:04.213180 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:06.037751178Z 2019/01/25 15:45:06 [INFO] Handling backend connection request [c-794nf:m-d431f20af024] 2019-01-25T15:45:14.638410132Z 2019-01-25 15:45:14.638253 W | etcdserver: apply entries took too long [175.241633ms for 1 entries] 2019-01-25T15:45:14.638452065Z 2019-01-25 15:45:14.638290 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:16.038055602Z 2019/01/25 15:45:16 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->192.168.1.32:54018: i/o timeout 2019-01-25T15:45:19.099284457Z 2019-01-25 15:45:19.099092 W | etcdserver: apply entries took too long [228.07355ms for 1 entries] 2019-01-25T15:45:19.099341354Z 2019-01-25 15:45:19.099125 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:41.417342397Z 2019/01/25 15:45:41 [INFO] Handling backend connection request [c-794nf:m-d431f20af024] 2019-01-25T15:45:51.417609294Z 2019/01/25 15:45:51 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->192.168.1.32:54052: i/o timeout

On and on endlessly.

Other details that may be helpful: The node seems to work fine.

Environment information

  • Rancher version (rancher/stable): 2.1.5
  • Installation option (single install/HA): single

Cluster information

  • Cluster type : Custom
  • Machine type: VM
  • Docker Version: 17.3.2
  • Kubernetes version (use kubectl version): 1.11.6
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 14
  • Comments: 46

Most upvoted comments

still on v2.2.4

Still happening on v2.6.3

Still happening on v2.6.9:

time="2022-07-12T20:13:44Z" level=error msg="Remotedialer proxy error" error="read tcp 10.244.0.33:34886->81.x.x.x:443: read: connection reset by peer"
time="2022-07-12T20:13:46Z" level=error msg="Failed to dial steve aggregation server: read tcp 10.244.0.33:34870->81.x.x.x:443: read: connection reset by peer"
E0712 20:13:51.497369      39 leaderelection.go:330] error retrieving resource lock kube-system/cattle-controllers: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": context deadline exceeded
I0712 20:13:51.501706      39 leaderelection.go:283] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition
time="2022-07-12T20:13:51Z" level=fatal msg="leaderelection lost for cattle-controllers"
E0712 20:13:51.501776      39 leaderelection.go:306] Failed to release lock: resource name may not be empty

still happening with 2.6.2

We are seeing the same problem - has anyone managed to fix this issue? is it even a real problem or a bug?

same in Rancher v2.6.9. Any idea what’s causing the problem? When I’m executing into the container, I can use curl to get the content of the rancher address, there’s no problem. Could it have something to do with a self-signed certificate? our rancher instance is only accessible in the LAN.

Same here with 2.2.1. The problem only occurs when adding a worker node. A full node with control pane, etcd and worker pane doesn’t have this problem.

EDIT: worker + etcd is also not affected

Same problem here with version 2.1.6. Adding a custom host is not possible.

Still happening on rancher/rancher:v2.4.5

I also see this issue with a fresh 2.1.7 rke rancher install. Lots of logs like this are generated. I see about 800 INFO logs / day with this timeout on the rancher 3 node cluster itself. It would be great to eliminate these logs if they are due to some bug since it can hamper troubleshooting down the road.