rancher: error in remotedialer server

What kind of request is this (question/bug/enhancement/feature request): question

Steps to reproduce (least amount of steps as possible): I have just added a new custom node from a vm identical in everything to the other nodes,

Result:

2019-01-25T15:44:42.316560829Z 2019/01/25 15:44:42 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->192.168.1.32:53986: i/o timeout 2019-01-25T15:44:43.737401840Z 2019-01-25 15:44:43.737223 W | etcdserver: apply entries took too long [133.466929ms for 1 entries] 2019-01-25T15:44:43.737445277Z 2019-01-25 15:44:43.737256 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:04.213325099Z 2019-01-25 15:45:04.213148 W | etcdserver: apply entries took too long [170.190814ms for 1 entries] 2019-01-25T15:45:04.213379786Z 2019-01-25 15:45:04.213180 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:06.037751178Z 2019/01/25 15:45:06 [INFO] Handling backend connection request [c-794nf:m-d431f20af024] 2019-01-25T15:45:14.638410132Z 2019-01-25 15:45:14.638253 W | etcdserver: apply entries took too long [175.241633ms for 1 entries] 2019-01-25T15:45:14.638452065Z 2019-01-25 15:45:14.638290 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:16.038055602Z 2019/01/25 15:45:16 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->192.168.1.32:54018: i/o timeout 2019-01-25T15:45:19.099284457Z 2019-01-25 15:45:19.099092 W | etcdserver: apply entries took too long [228.07355ms for 1 entries] 2019-01-25T15:45:19.099341354Z 2019-01-25 15:45:19.099125 W | etcdserver: avoid queries with large range/delete range! 2019-01-25T15:45:41.417342397Z 2019/01/25 15:45:41 [INFO] Handling backend connection request [c-794nf:m-d431f20af024] 2019-01-25T15:45:51.417609294Z 2019/01/25 15:45:51 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->192.168.1.32:54052: i/o timeout

On and on endlessly.

Other details that may be helpful: The node seems to work fine.

Environment information

Rancher version (rancher/stable): 2.1.5
Installation option (single install/HA): single

Cluster information

Cluster type : Custom
Machine type: VM
Docker Version: 17.3.2
Kubernetes version (use kubectl version): 1.11.6

Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 14
Comments: 46

Most upvoted comments

still on v2.2.4

iDube on Jul 4, 2019

Still happening on v2.6.3

mtnezm on Jan 14, 2022

Still happening on v2.6.9:

time="2022-07-12T20:13:44Z" level=error msg="Remotedialer proxy error" error="read tcp 10.244.0.33:34886->81.x.x.x:443: read: connection reset by peer"
time="2022-07-12T20:13:46Z" level=error msg="Failed to dial steve aggregation server: read tcp 10.244.0.33:34870->81.x.x.x:443: read: connection reset by peer"
E0712 20:13:51.497369      39 leaderelection.go:330] error retrieving resource lock kube-system/cattle-controllers: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": context deadline exceeded
I0712 20:13:51.501706      39 leaderelection.go:283] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition
time="2022-07-12T20:13:51Z" level=fatal msg="leaderelection lost for cattle-controllers"
E0712 20:13:51.501776      39 leaderelection.go:306] Failed to release lock: resource name may not be empty

lenkeybalis on Nov 29, 2022

still happening with 2.6.2

ravikakadia on Oct 25, 2021

We are seeing the same problem - has anyone managed to fix this issue? is it even a real problem or a bug?

carlcauchigig on Feb 13, 2019

same in Rancher v2.6.9. Any idea what’s causing the problem? When I’m executing into the container, I can use curl to get the content of the rancher address, there’s no problem. Could it have something to do with a self-signed certificate? our rancher instance is only accessible in the LAN.

MSandro on Nov 24, 2022

Same here with 2.2.1. The problem only occurs when adding a worker node. A full node with control pane, etcd and worker pane doesn’t have this problem.

EDIT: worker + etcd is also not affected

thomashoell on Apr 6, 2019

Same problem here with version 2.1.6. Adding a custom host is not possible.

matze502 on Feb 15, 2019

Still happening on rancher/rancher:v2.4.5

theWayToAI on Jun 16, 2021

I also see this issue with a fresh 2.1.7 rke rancher install. Lots of logs like this are generated. I see about 800 INFO logs / day with this timeout on the rancher 3 node cluster itself. It would be great to eliminate these logs if they are due to some bug since it can hamper troubleshooting down the road.

mshack70 on Mar 25, 2019