nats-server: Improve TLS handshake error message

  • Defect
  • Feature Request or Change Proposal

Defects

I’m running a NATS service 2.1.2 I am using TLS between my NATS brokers (port 6222) as well as TLS from clients to brokers (port 4222). Additionally I have added a http: localhost:8222 config stanza to get metrics.

My NATS servers are outputting

...
[1] 2019/11/26 14:52:29.877977 [ERR] X.X.X.X:48306 - cid:48260 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.171968 [ERR] X.X.X.X:39272 - cid:48261 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.284848 [ERR] X.X.X.X:19473 - cid:48262 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.443337 [ERR] X.X.X.X:30436 - cid:48263 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.716426 [ERR] X.X.X.X:6808 - cid:48264 - TLS handshake error: EOF
[1] 2019/11/26 14:52:30.872010 [ERR] X.X.X.X:34951 - cid:48265 - TLS handshake error: EOF
[1] 2019/11/26 14:52:31.293905 [ERR] X.X.X.X:14828 - cid:48266 - TLS handshake error: EOF
[1] 2019/11/26 14:52:31.576071 [ERR] X.X.X.X:48940 - cid:48267 - TLS handshake error: EOF
...

I’m pretty sure I have identified the culprit as being a load balancer pinging my NATS instances (on port 4222). However, it would be very helpful if the error logs could say which port the TLS handshake fails on to be able to debug errors like this a lot easier. See this as a feature request.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 3
  • Comments: 25 (10 by maintainers)

Most upvoted comments

Do you think nats will have any issues if we use 1 record with 3 IPs in it balanced ?

Thanks for sharing @JnMik, I don’t see an issue with this setup as long as the ExternalIP metadata is present in the Kubernetes cluster (that is if both INTERNAL IP and EXTERNAL IP are displayed when executing kubectl get nodes -o wide). If the external ip metadata is in the node, then the servers will be able to advertiser the other alive public ips that are part of the cluster and use that for reconnecting and failover right away and avoid the extra DNS lookup. The NATS clients also get a list of the ips when connecting and pick one randomly so clients should be distributed evenly as well.

We do something similar with the connect.ngs.global service that Synadia offers, for example the nodes available in the hostname uswest2.aws.ngs.global are right now for me:

dig uswest2.aws.ngs.global
...
;; ANSWER SECTION:
uswest2.aws.ngs.global.	60	IN	A	54.202.186.240
uswest2.aws.ngs.global.	60	IN	A	35.166.100.73
uswest2.aws.ngs.global.	60	IN	A	44.228.141.181

And if I nc or telnet against the client port I get the rest of the cluster members:

telnet uswest2.aws.ngs.global 4222
INFO {...,"cluster":"aws-uswest2","connect_urls":["35.166.100.73:4222","44.228.141.181:4222","54.202.186.240:4222"]} 

In order to enable this advertisements, we use the following initializer container that has some extra Kubernetes policy to be able to lookup what is the public ip of the Kubelet where it is running: https://github.com/nats-io/k8s/blob/master/nats-server/nats-server-with-auth-and-tls.yml#L132-L153 And have the server load that file from the config via an empty directory volume: https://github.com/nats-io/k8s/blob/master/nats-server/nats-server-with-auth-and-tls.yml#L54

I would not put any LB in between NATS clients and servers. Would just use DNS with multiple A records or a list of servers in the client. NATS handles all that stuff for you and better then the LBs do.

NATS protocol was designed ~10yrs ago, way before k8s was on the scene 😉