nats-server: Improve TLS handshake error message
- Defect
- Feature Request or Change Proposal
Defects
I’m running a NATS service 2.1.2 I am using TLS between my NATS brokers (port 6222) as well as TLS from clients to brokers (port 4222). Additionally I have added a http: localhost:8222
config stanza to get metrics.
My NATS servers are outputting
...
[1] 2019/11/26 14:52:29.877977 [ERR] X.X.X.X:48306 - cid:48260 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.171968 [ERR] X.X.X.X:39272 - cid:48261 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.284848 [ERR] X.X.X.X:19473 - cid:48262 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.443337 [ERR] X.X.X.X:30436 - cid:48263 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.716426 [ERR] X.X.X.X:6808 - cid:48264 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.872010 [ERR] X.X.X.X:34951 - cid:48265 - TLS handshake error: EOF
[1] 2019/11/26 14:52:31.293905 [ERR] X.X.X.X:14828 - cid:48266 - TLS handshake error: EOF
[1] 2019/11/26 14:52:31.576071 [ERR] X.X.X.X:48940 - cid:48267 - TLS handshake error: EOF
...
I’m pretty sure I have identified the culprit as being a load balancer pinging my NATS instances (on port 4222). However, it would be very helpful if the error logs could say which port the TLS handshake fails on to be able to debug errors like this a lot easier. See this as a feature request.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 3
- Comments: 25 (10 by maintainers)
Thanks for sharing @JnMik, I don’t see an issue with this setup as long as the ExternalIP metadata is present in the Kubernetes cluster (that is if both
INTERNAL IP
andEXTERNAL IP
are displayed when executingkubectl get nodes -o wide
). If the external ip metadata is in the node, then the servers will be able to advertiser the other alive public ips that are part of the cluster and use that for reconnecting and failover right away and avoid the extra DNS lookup. The NATS clients also get a list of the ips when connecting and pick one randomly so clients should be distributed evenly as well.We do something similar with the
connect.ngs.global
service that Synadia offers, for example the nodes available in the hostnameuswest2.aws.ngs.global
are right now for me:And if I
nc
ortelnet
against the client port I get the rest of the cluster members:In order to enable this advertisements, we use the following initializer container that has some extra Kubernetes policy to be able to lookup what is the public ip of the Kubelet where it is running: https://github.com/nats-io/k8s/blob/master/nats-server/nats-server-with-auth-and-tls.yml#L132-L153 And have the server load that file from the config via an empty directory volume: https://github.com/nats-io/k8s/blob/master/nats-server/nats-server-with-auth-and-tls.yml#L54
I would not put any LB in between NATS clients and servers. Would just use DNS with multiple A records or a list of servers in the client. NATS handles all that stuff for you and better then the LBs do.
NATS protocol was designed ~10yrs ago, way before k8s was on the scene 😉