shellhub: the agent docker was stuck on dns lookup when there was connectivity problem
It seem the agent docker was stuck on dns lookup and required restart the agent docker to solved.
2020/12/29 07:39:52 [DEBUG] GET https://sh.mydomain.net:443/info: retrying in 30s (2147483423 left) 2020/12/29 07:40:22 [ERR] GET https://sh.mydomain.net:443/info request failed: Get https://sh.mydomain.net:443/info: dial tcp: lookup sh.mydomain.net on [::1]:53: read udp [::1]:43645->[::1]:53: read: connection refused
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (17 by maintainers)
How about it being your first PR @jcarrus?
@otavio that’s a good point. You are totally right that it’s likely a docker issue (unless the agent code or Go is somehow caching the network settings, but I’m not seeing any evidence of that).
That being said, it would be helpful to have a way to address this. I’d be totally fine for a hardcoded retry period or a configurable one. Something to the effect of
NETWORK_RETRY_PERIOD=300, which would indicate that after 300 seconds of no connection, the container should restart.It would be really great to have an exponential backoff, but I’m not sure that’s possible since the container itself would have to fully restart to fix this networking bug.
So anyway, maybe the simplest would be to instead add a configuration option:
=>
SHELLHUB_NETWORK_TIMEOUTIf a network connection is not established to the server after
SHELLHUB_NETWORK_TIMEOUTseconds, quit the shellhub process. If unset, the process will retry forever.