nebula: 🐛 BUG: Issues with multiple fixed IP addresses for lighthouse
What version of nebula are you using?
1.7.2
What operating system are you using?
Linux
Describe the Bug
I’m having issues with hosts connecting when they have multiple IPs set for their lighthouse/relay. Most things seem to work fine but I was running into issues with NAT to NAT connections across two networks using multiple fixed IP addresses for one of my lighthouses:
Just for reference, here’s what the connection looks like:
rsyslog daemon -> NAT -> internet -> NAT -> rsyslog server
The relevant part of my config looks like this:
static_host_map:
172.0.0.2:
- xxx.example.com:4242
- xxx.xxx.245.206:4242
- xxx.xxx.181.204:4242
172.0.0.3:
- xxx.xxx.118.198:4242
The rsyslog daemon on 172.0.3.109 shows this:
xxx.xxxlevel=info msg="Attempt to relay through hosts" localIndex=2357375276 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=172.0.2.116
xxx.xxxlevel=info msg="Send handshake via relay" localIndex=2357375276 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.2.116
xxx.xxxlevel=info msg="Send handshake via relay" localIndex=2357375276 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.2.116
xxx.xxxlevel=info msg="Send handshake via relay" localIndex=2357375276 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.2.116
xxx.xxxlevel=info msg="Send handshake via relay" localIndex=2357375276 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.2.116
xxx.xxxlevel=info msg="Handshake timed out" durationNs=3037758038 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2357375276 localIndex=2357375276 remoteIndex=0 udpAddrs="[xxx.xxx.142.18:53561 10.0.0.12:53561]" vpnIp=172.0.2.116
While the rsyslog server host 172.0.2.116 shows this:
xxx.xxxlevel=info msg="Attempt to relay through hosts" relayIps="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" vpnIp=172.0.3.109
xxx.xxxlevel=info msg="Re-send CreateRelay request" relay=172.0.0.2 vpnIp=172.0.3.109
xxx.xxxlevel=info msg="Re-send CreateRelay request" relay=172.0.0.3 vpnIp=172.0.3.109
xxx.xxxlevel=info msg="Re-send CreateRelay request" relay=172.0.0.2 vpnIp=172.0.3.109
xxx.xxxlevel=info msg="Re-send CreateRelay request" relay=172.0.0.3 vpnIp=172.0.3.109
xxx.xxxlevel=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2890926437 udpAddrs="[xxx.xxx.151.80:52928 xxx.xxx.151.80:65320 192.168.1.214:55504]" vpnIp=172.0.3.109
When I remove the xxx.example.com:4242 and xxx.xxx.181.204:4242 lines from both host’s static_host_map entry, the traffic flows.
Logs from affected hosts
see above
Config files from affected hosts
see above
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 16
Sorry for the confusion.
Yes, that’s what I’m seeing. After running for a few weeks, Nebula on the rsyslog host seems to be able to connect out and no hosts can connect to the rsyslog server.
It seems to be the second case, a successful handshake with a connection that dies later. But I think where the confusion is coming from is that I noticed both cases happening. Let’s wait for logs and then we can break this down further. I’ll make a long, detailed post once I have more info for you.
Thanks for your patience!
So a host 172.0.2.116 (rsyslog) on Network A was able to communicate with some Network B hosts, but not 172.0.3.109 (unknown name) until Nebula was restarted on each of these hosts, at which time communication could be re-established?
Nebula has 10 slots for each of IPv4 and IPv6 addresses for a given host. A failure scenario for Nebula can occur when a host has more than 10 IP addresses, most of which are not routable (e.g. a bunch of Docker networks running on the host) and reports them to the Lighthouse. Nodes will query the Lighthouse and have no routable IP addresses. You can use
local_allow_listto restrict which addresses / adapters are considered. You should be able to see through handshake logs that all of the udpAddrs are non-routable.Restarting the affected hosts can temporarily solve the problem because they will re-send their IP addresses to the Lighthouse, possibly in a different order.
AFAICT, that issue would not be resolved or affected by having some hosts report to an extra Lighthouse.
This is all speculation without logs. Waiting with abated breath. 😃