nebula: Nodes can see the Lighthouse but they cant see eachother

Hi

I setup a small network of 3+ nodes. Non LH nodes can ping the LH. LH can ping the nodes but the nodes cant ping eachoher.

This seems to work only for the nodes that are on the same wifi network. Anything from external to another external node or external to internal does not work, unless there is another form of VPN is active between the exteral nodes, like Wireguard.

The LH is behind a router so I port forwarded the default port, this seems to work given that any of the nodes can connect to the LH.

It is interesting that when I try to ping from one of the external nodes to a node in the home wifi, there is activity on the receiving internal node, but pings are all unsuccessful meaning that the ping just stalls.


time="2019-12-11T14:46:38-06:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=688165208 remoteIndex=0 udpAddr="192.168.0.23:59683" vpnIp=10.x.0.12

time="2019-12-11T14:46:40-06:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=688165208 remoteIndex=0 udpAddr="EXTERNAL-IP:59683" vpnIp=10.x.0.12

time="2019-12-11T14:46:43-06:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=688165208 remoteIndex=0 udpAddr="10.3.0.2:59683" vpnIp=10.x.0.12


I have all the punch stuff enabled. Am I supposed to forward more ports or port ranges?

Please bear in mind that in the given situation WG works perfectly, and all the wg nodes can see eachother without issues, including all the traffic routing setup. I would like to setup Nebula as a fallback solution, in case one wonders why I am trying to use both.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 6
  • Comments: 25

Most upvoted comments

Hi All, This is almost always a NAT or two that aren’t playing nice. Sorry for the delay replying!

A week ago, when the core devs had a meeting, we spec’d out the way we are going to do relays, at the protocol level. While this wasn’t a design goal of Nebula, there is enough need in the community that it is worth doing.

Same issue here.

I’ve got one lighthouse node with public ip addr, firewall open port 4242 for udp&tcp; one laptop at home behind nat; one laptop at office(also behind nat).The connection status would be:

lighthouse <-both way connected-> home laptop lighthouse <-both way connected-> office laptop home laptop <-no connection-> office laptop

Sure

cat config.yml |sed '/#/d'|sed -r '/^\s*$/d'

pki:
  ca: ./ca.crt
  cert: ./ROOT.crt
  key: ./ROOT.key
static_host_map:
  "10.1.0.1": ["IP:4545"]
lighthouse:
  am_lighthouse: true
  interval: 60
  hosts:
listen:
  host: 0.0.0.0
  port: 4545
punchy: true
punch_back: true
tun:
  dev: nebula1
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
  routes:
logging:
  level: info
  format: text
firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: icmp
      host: any
    - port: 443
      proto: tcp
      groups:
        - laptop
        - home

So I’ve been slowly expanding out my nebula network to machines, and haven’t had many issues. I invited a friend in to my network and generated some certs for 4 of his machines.

So far he has 3 running, and the results are very odd. 2 of the machines are at his house, and 1 machine is at his work. 1 machine at his house is windows, 1 machine is ubuntu, and the machine at his work is also Windows.

The machine I’m currently on is a Windows machine at my house.

This windows machine at my house is able to ping and connect to his ubuntu machine at his house, but not to the windows machine at his house or work. He is able to ping his house ubuntu machine from work but not the house windows machine at work.

Here’s the logs from the lighthouse 128.0.3.2 is the windows machine at his house 128.0.3.3 is the ubuntu machine at his house 128.0.3.4 is the windows machine at his work

terry@cloudlink:/var/log$ cat syslog | grep nebula | grep 128.0.3.2
Jan 14 02:50:12 cloudlink nebula[29594]: time="2020-01-14T02:50:12Z" level=info msg="Handshake message received" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=681713942 remoteIndex=0 responderIndex=0 udpAddr="104.XXX.XXX.84:55939" vpnIp=128.0.3.2
Jan 14 02:50:12 cloudlink nebula[29594]: time="2020-01-14T02:50:12Z" level=info msg="Handshake message sent" handshake="map[stage:2 style:ix_psk0]" initiatorIndex=681713942 remoteIndex=0 responderIndex=3524367104 udpAddr="104.XXX.XXX.84:55939" vpnIp=128.0.3.2
Jan 14 03:10:57 cloudlink nebula[29594]: time="2020-01-14T03:10:57Z" level=info msg="Close tunnel received, tearing down." udpAddr="104.XXX.XXX.84:55939" vpnIp=128.0.3.2
Jan 14 03:11:09 cloudlink nebula[29594]: time="2020-01-14T03:11:09Z" level=info msg="Handshake message received" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3980494719 remoteIndex=0 responderIndex=0 udpAddr="104.XXX.XXX.84:58401" vpnIp=128.0.3.2
Jan 14 03:11:09 cloudlink nebula[29594]: time="2020-01-14T03:11:09Z" level=info msg="Handshake message sent" handshake="map[stage:2 style:ix_psk0]" initiatorIndex=3980494719 remoteIndex=0 responderIndex=3416636504 udpAddr="104.XXX.XXX.84:58401" vpnIp=128.0.3.2
terry@cloudlink:/var/log$ cat syslog | grep nebula | grep 128.0.3.3
Jan 14 02:50:13 cloudlink nebula[29594]: time="2020-01-14T02:50:13Z" level=info msg="Handshake message received" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=725317506 remoteIndex=0 responderIndex=0 udpAddr="104.XXX.XXX.84:4242" vpnIp=128.0.3.3
Jan 14 02:50:13 cloudlink nebula[29594]: time="2020-01-14T02:50:13Z" level=info msg="Handshake message sent" handshake="map[stage:2 style:ix_psk0]" initiatorIndex=725317506 remoteIndex=0 responderIndex=1799649400 udpAddr="104.XXX.XXX.84:4242" vpnIp=128.0.3.3
terry@cloudlink:/var/log$ cat syslog | grep nebula | grep 128.0.3.4
Jan 14 02:50:37 cloudlink nebula[29594]: time="2020-01-14T02:50:37Z" level=info msg="Handshake message received" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=156029206 remoteIndex=0 responderIndex=0 udpAddr="198.XX.XXX.42:60679" vpnIp=128.0.3.4
Jan 14 02:50:37 cloudlink nebula[29594]: time="2020-01-14T02:50:37Z" level=info msg="Handshake message sent" handshake="map[stage:2 style:ix_psk0]" initiatorIndex=156029206 remoteIndex=0 responderIndex=1379347581 udpAddr="198.XX.XXX.42:60679" vpnIp=128.0.3.4

This 128.0.1.3 is the windows machine I’m currently on

terry@cloudlink:/var/log$ cat syslog | grep nebula | grep 128.0.1.3
Jan 14 03:28:39 cloudlink nebula[29594]: time="2020-01-14T03:28:39Z" level=info msg="Handshake message received" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3413707116 remoteIndex=0 responderIndex=0 udpAddr="184.61.217.140:59919" vpnIp=128.0.1.3
Jan 14 03:28:39 cloudlink nebula[29594]: time="2020-01-14T03:28:39Z" level=info msg="Handshake message sent" handshake="map[stage:2 style:ix_psk0]" initiatorIndex=3413707116 remoteIndex=0 responderIndex=195554109 udpAddr="184.61.217.140:59919" vpnIp=128.0.1.3
Jan 14 03:28:39 cloudlink nebula[29594]: time="2020-01-14T03:28:39Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="184.61.217.140:59919" vpnIp=128.0.1.3

The config file for his linux machine is here

# This is the nebula example configuration file. You must edit, at a minimum, the static_host_map, lighthouse, and firewall sections
# Some options in this file are HUPable, including the pki section. (A HUP will reload credentials from disk without affecting existing tunnels)
 
# PKI defines the location of credentials for this node. Each of these can also be inlined by using the yaml ": |" syntax.
pki:
  # The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/plex-linux-vm.crt
  key: /etc/nebula/plex-linux-vm.key
  #blacklist is a list of certificate fingerprints that we will refuse to talk to
  #blacklist:
  #  - c99d4e650533b92061b09918e838a5a0a6aaee21eed1d12fd937682865936c72
 
# The static host map defines a set of hosts with fixed IP addresses on the internet (or any network).
# A host can have multiple fixed IP addresses defined here, and nebula will try each when establishing a tunnel.
# The syntax is:
#   "{nebula ip}": ["{routable ip/dns name}:{routable port}"]
# Example, if your lighthouse has the nebula IP of 192.168.100.1 and has the real ip address of 100.64.22.11 and runs on port 4242:
static_host_map:
  "128.0.0.1": ["108.XXX.XXX.147:4242"]
  "128.0.0.2": ["35.XXX.XX.50:4242"]
 
lighthouse:
  # am_lighthouse is used to enable lighthouse functionality for a node. This should ONLY be true on nodes
  # you have configured to be lighthouses in your network
  am_lighthouse: false
  # serve_dns optionally starts a dns listener that responds to various queries and can even be
  # delegated to for resolution
  #serve_dns: false
  # interval is the number of seconds between updates from this node to a lighthouse.
  # during updates, a node sends information about its current IP addresses to each node.
  interval: 60
  # hosts is a list of lighthouse hosts this node should report to and query from
  # IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
  hosts:
    - "128.0.0.1"
    - "128.0.0.2"
 
# Port Nebula will be listening on. The default here is 4242. For a lighthouse node, the port should be defined,
# however using port 0 will dynamically assign a port and is recommended for roaming nodes.
listen:
  host: 0.0.0.0
  port: 0
  # Sets the max number of packets to pull from the kernel for each syscall (under systems that support recvmmsg)
  # default is 64, does not support reload
  #batch: 64
  # Configure socket buffers for the udp side (outside), leave unset to use the system defaults. Values will be doubled by the kernel
  # Default is net.core.rmem_default and net.core.wmem_default (/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_default)
  # Maximum is limited by memory in the system, SO_RCVBUFFORCE and SO_SNDBUFFORCE is used to avoid having to raise the system wide
  # max, net.core.rmem_max and net.core.wmem_max
  #read_buffer: 10485760
  #write_buffer: 10485760
 
# Punchy continues to punch inbound/outbound at a regular interval to avoid expiration of firewall nat mappings
punchy: true
# punch_back means that a node you are trying to reach will connect back out to you if your hole punching fails
# this is extremely useful if one node is behind a difficult nat, such as symmetric
punch_back: true
 
# Cipher allows you to choose between the available ciphers for your network.
# IMPORTANT: this value must be identical on ALL NODES/LIGHTHOUSES. We do not/will not support use of different ciphers simultaneously!
#cipher: chachapoly
 
# Local range is used to define a hint about the local network range, which speeds up discovering the fastest
# path to a network adjacent nebula node.
#local_range: "172.16.0.0/24"
 
# sshd can expose informational and administrative functions via ssh this is a
#sshd:
  # Toggles the feature
#  enabled: true
  # Host and port to listen on, port 22 is not allowed for your safety
#  listen: 127.0.0.1:477
  # A file containing the ssh host private key to use
  # A decent way to generate one: ssh-keygen -t ed25519 -f ssh_host_ed25519_key -N "" < /dev/null
#  host_key: /home/terry/.ssh/ssh_host_ed25519_key
  # A file containing a list of authorized public keys
#  authorized_users:
#    - user: terry
      # keys can be an array of strings or single string
#      keys:
#        - 
 
# Configure the private interface. Note: addr is baked into the nebula certificate
tun:
  # Name of the device
  dev: xoverlay
  # Toggles forwarding of local broadcast packets, the address of which depends on the ip/mask encoded in pki.cert
  drop_local_broadcast: false
  # Toggles forwarding of multicast packets
  drop_multicast: false
  # Sets the transmit queue length, if you notice lots of transmit drops on the tun it may help to raise this number. Default is 500
  tx_queue: 500
  # Default MTU for every packet, safe setting is (and the default) 1300 for internet based traffic
  mtu: 1300
  # Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
  routes:
    #- mtu: 8800
    #  route: 10.0.0.0/16
 
# TODO
# Configure logging level
logging:
  # panic, fatal, error, warning, info, or debug. Default is info
  level: info
  # json or text formats currently available. Default is text
  format: text
 
#stats:
  #type: graphite
  #prefix: nebula
  #protocol: tcp
  #host: 127.0.0.1:9999
  #interval: 10s
 
  #type: prometheus
  #listen: 127.0.0.1:8080
  #path: /metrics
  #namespace: prometheusns
  #subsystem: nebula
  #interval: 10s
 
# Nebula security group configuration
firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000
 
  # The firewall is default deny. There is no way to write a deny rule.
  # Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
  # Logical evaluation is roughly: port AND proto AND ca_sha AND ca_name AND (host OR group OR groups OR cidr)
  # - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
  #   code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
  #   proto: `any`, `tcp`, `udp`, or `icmp`
  #   host: `any` or a literal hostname, ie `test-host`
  #   group: `any` or a literal group name, ie `default-group`
  #   groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
  #   cidr: a CIDR, `0.0.0.0/0` is any.
  #   ca_name: An issuing CA name
  #   ca_sha: An issuing CA shasum
 
  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any
 
  inbound:
    # Allow icmp between any nebula hosts
    - port: any
      proto: icmp
      host: any
 
    # Allow tcp/443 from any host with chris group
    - port: 443
      proto: tcp
      group: chris

And here is his Windows config file

# This is the nebula example configuration file. You must edit, at a minimum, the static_host_map, lighthouse, and firewall sections
# Some options in this file are HUPable, including the pki section. (A HUP will reload credentials from disk without affecting existing tunnels)

# PKI defines the location of credentials for this node. Each of these can also be inlined by using the yaml ": |" syntax.
pki:
  # The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
  ca: C:\\Windows\\System32\\Nebula\\ca.crt
  cert: C:\\Windows\\System32\\Nebula\\hoyane-win-svr.crt
  key: C:\\Windows\\System32\\Nebula\\hoyane-win-svr.key
  #blacklist is a list of certificate fingerprints that we will refuse to talk to
  #blacklist:
  #  - c99d4e650533b92061b09918e838a5a0a6aaee21eed1d12fd937682865936c72

# The static host map defines a set of hosts with fixed IP addresses on the internet (or any network).
# A host can have multiple fixed IP addresses defined here, and nebula will try each when establishing a tunnel.
# The syntax is:
#   "{nebula ip}": ["{routable ip/dns name}:{routable port}"]
# Example, if your lighthouse has the nebula IP of 192.168.100.1 and has the real ip address of 100.64.22.11 and runs on port 4242:
static_host_map:
  "128.0.0.1": ["108.XXX.XXX.147:4242"]
  "128.0.0.2": ["35.XXX.XX.50:4242"]

lighthouse:
  # am_lighthouse is used to enable lighthouse functionality for a node. This should ONLY be true on nodes
  # you have configured to be lighthouses in your network
  am_lighthouse: false
  # serve_dns optionally starts a dns listener that responds to various queries and can even be
  # delegated to for resolution
  #serve_dns: false
  # interval is the number of seconds between updates from this node to a lighthouse.
  # during updates, a node sends information about its current IP addresses to each node.
  interval: 60
  # hosts is a list of lighthouse hosts this node should report to and query from
  # IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
  hosts:
    - "128.0.0.1"
    - "128.0.0.2"

# Port Nebula will be listening on. The default here is 4242. For a lighthouse node, the port should be defined,
# however using port 0 will dynamically assign a port and is recommended for roaming nodes.
listen:
  host: 0.0.0.0
  port: 0
  # Sets the max number of packets to pull from the kernel for each syscall (under systems that support recvmmsg)
  # default is 64, does not support reload
  #batch: 64
  # Configure socket buffers for the udp side (outside), leave unset to use the system defaults. Values will be doubled by the kernel
  # Default is net.core.rmem_default and net.core.wmem_default (/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_default)
  # Maximum is limited by memory in the system, SO_RCVBUFFORCE and SO_SNDBUFFORCE is used to avoid having to raise the system wide
  # max, net.core.rmem_max and net.core.wmem_max
  #read_buffer: 10485760
  #write_buffer: 10485760

# Punchy continues to punch inbound/outbound at a regular interval to avoid expiration of firewall nat mappings
punchy: true
# punch_back means that a node you are trying to reach will connect back out to you if your hole punching fails
# this is extremely useful if one node is behind a difficult nat, such as symmetric
punch_back: true

# Cipher allows you to choose between the available ciphers for your network.
# IMPORTANT: this value must be identical on ALL NODES/LIGHTHOUSES. We do not/will not support use of different ciphers simultaneously!
#cipher: chachapoly

# Local range is used to define a hint about the local network range, which speeds up discovering the fastest
# path to a network adjacent nebula node.
#local_range: "172.16.0.0/24"

# sshd can expose informational and administrative functions via ssh this is a
#sshd:
  # Toggles the feature
#  enabled: true
  # Host and port to listen on, port 22 is not allowed for your safety
#  listen: 127.0.0.1:
  # A file containing the ssh host private key to use
  # A decent way to generate one: ssh-keygen -t ed25519 -f ssh_host_ed25519_key -N "" < /dev/null
#  host_key: /home/terry/.ssh/ssh_host_ed25519_key
  # A file containing a list of authorized public keys
#  authorized_users:
#    - user: terry
      # keys can be an array of strings or single string
#      keys:
#        - "

# Configure the private interface. Note: addr is baked into the nebula certificate
tun:
  # Name of the device
  dev: xoverlay
  # Toggles forwarding of local broadcast packets, the address of which depends on the ip/mask encoded in pki.cert
  drop_local_broadcast: false
  # Toggles forwarding of multicast packets
  drop_multicast: false
  # Sets the transmit queue length, if you notice lots of transmit drops on the tun it may help to raise this number. Default is 500
  tx_queue: 500
  # Default MTU for every packet, safe setting is (and the default) 1300 for internet based traffic
  mtu: 1300
  # Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
  routes:
    #- mtu: 8800
    #  route: 10.0.0.0/16

# TODO
# Configure logging level
logging:
  # panic, fatal, error, warning, info, or debug. Default is info
  level: info
  # json or text formats currently available. Default is text
  format: text

#stats:
  #type: graphite
  #prefix: nebula
  #protocol: tcp
  #host: 127.0.0.1:9999
  #interval: 10s

  #type: prometheus
  #listen: 127.0.0.1:8080
  #path: /metrics
  #namespace: prometheusns
  #subsystem: nebula
  #interval: 10s

# Nebula security group configuration
firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  # The firewall is default deny. There is no way to write a deny rule.
  # Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
  # Logical evaluation is roughly: port AND proto AND ca_sha AND ca_name AND (host OR group OR groups OR cidr)
  # - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
  #   code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
  #   proto: `any`, `tcp`, `udp`, or `icmp`
  #   host: `any` or a literal hostname, ie `test-host`
  #   group: `any` or a literal group name, ie `default-group`
  #   groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
  #   cidr: a CIDR, `0.0.0.0/0` is any.
  #   ca_name: An issuing CA name
  #   ca_sha: An issuing CA shasum

  outbound:
    # Allow all outbound traffic from this node
    - port: any
      proto: any
      host: any

  inbound:
    # Allow icmp between any nebula hosts
    - port: any
      proto: icmp
      host: any

    - port: 3389
      proto: tcp
      group: chris

Let me know if you want more of anything.