shellhub: the agent docker was stuck on dns lookup when there was connectivity problem

It seem the agent docker was stuck on dns lookup and required restart the agent docker to solved.

2020/12/29 07:39:52 [DEBUG] GET https://sh.mydomain.net:443/info: retrying in 30s (2147483423 left) 2020/12/29 07:40:22 [ERR] GET https://sh.mydomain.net:443/info request failed: Get https://sh.mydomain.net:443/info: dial tcp: lookup sh.mydomain.net on [::1]:53: read udp [::1]:43645->[::1]:53: read: connection refused

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 18 (17 by maintainers)

Most upvoted comments

How about it being your first PR @jcarrus?

otavio on Jul 6, 2021

@otavio that’s a good point. You are totally right that it’s likely a docker issue (unless the agent code or Go is somehow caching the network settings, but I’m not seeing any evidence of that).

That being said, it would be helpful to have a way to address this. I’d be totally fine for a hardcoded retry period or a configurable one. Something to the effect of NETWORK_RETRY_PERIOD=300, which would indicate that after 300 seconds of no connection, the container should restart.

It would be really great to have an exponential backoff, but I’m not sure that’s possible since the container itself would have to fully restart to fix this networking bug.

So anyway, maybe the simplest would be to instead add a configuration option:

=> SHELLHUB_NETWORK_TIMEOUT

If a network connection is not established to the server after SHELLHUB_NETWORK_TIMEOUT seconds, quit the shellhub process. If unset, the process will retry forever.

jcarrus on Jul 6, 2021