core: radvd stops announcing IPv6 prefix after a while
Important notices Before you add a new report, we ask you kindly to acknowledge the following:
-
I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
-
I have searched the existing issues and I’m convinced that mine is new.
Describe the bug
After upgrading from 20.1 to 20.7.2 I am losing IPv6 internet connectivity after ~50-60 hours. This happens because radvd stops announcing the prefix and does not reply to solicit messages any more.
This has nothing to do with chaning IPv6 prefixes in my case, as there is no PPPoE reconnect and no prefix change request from my ISP (my ISP enforces this every 180 days).
Restarting radvd from the web GUI fixes this.
To Reproduce
- Connect to PPPoE network with DHCPv6-PD
- LAN interface with IPv6 tracking on WAN
- IPv6 will be working in the LAN for a while (round about two days)
- After a while IPv6 connectivity is lost. The reason is that the prefix is no longer announced. It looks like radvd is hanging (see logs down below which support this theory).
- Restart radvd from web GUI and have a working IPv6 network again for the next ~50-60 hours
Possibly related: #4282 (this issue mentiones reconnects, which do not apply in my case)
Possibly related forum threads:
https://forum.opnsense.org/index.php?topic=19032.0 https://forum.opnsense.org/index.php?topic=18868.0 https://forum.opnsense.org/index.php?topic=18549.0
Expected behavior
radvd should always announce the IPv6 prefix without hanging after a while 😃
Relevant log files
- radvd does not crash. The process remains running and there are no error logs.
- There are no relevant log entries which show any issues with interfaces/networks/reconnects/…
- I have checked the
truss
output of a defective radvd and it looks very interesting:
Defective truss output on radvd process
truss -p 14675
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00) = 0 (0x0)
close(8) = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8) = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
__sysctl(0x6f78eb0083a0,0x2,0x6f78eb00a8f0,0x6f78eb008398,0x0,0x0) = 0 (0x0)
getpid() = 14675 (0x3953)
sendto(5,"<27>1 2020-09-09T12:24:53.557337"...,116,0,NULL,0) = 116 (0x74)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00) = 0 (0x0)
close(8) = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8) = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
__sysctl(0x6f78eb0083a0,0x2,0x6f78eb00a8f0,0x6f78eb008398,0x0,0x0) = 0 (0x0)
getpid() = 14675 (0x3953)
sendto(5,"<27>1 2020-09-09T12:25:01.135191"...,110,0,NULL,0) = 110 (0x6e)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00) = 0 (0x0)
close(8) = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8) = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
__sysctl(0x6f78eb0083a0,0x2,0x6f78eb00a8f0,0x6f78eb008398,0x0,0x0) = 0 (0x0)
getpid() = 14675 (0x3953)
sendto(5,"<27>1 2020-09-09T12:25:08.924928"...,117,0,NULL,0) = 117 (0x75)
truss output of working radvd (still advertising routes)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8) = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) = 0 (0x0)
sendmsg(6,{{ AF_INET6 [ff02::1]:58 },28,[{"\M^F\0\0\0@\0\0\M-4\0\0\0\0\0\0"...,120}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xfe,0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x0d,0xb9,0xff,0xfe,0x4a,0x7c,0x02,0x13,0x00,0x00,0x00}}},40,0},0) = 120 (0x78)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 1 (0x1)
recvmsg(6,{{ AF_INET6 [fe80::20d:b9ff:fe4a:7c02]:0 },28,[{"\M^F\0\M^KI@\0\0\M-4\0\0\0\0\0\0"...,1500}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xff,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x13,0x00,0x00,0x00}},{level=IPPROTO_IPV6,type=IPV6_HOPLIMIT,data={0xff,0x00,0x00,0x00}}},64,0},0) = 120 (0x78)
__sysctl(0x6f78eb00ac20,0x6,0x0,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ac20,0x6,0x64c2d333000,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x0,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x64c2d333000,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00) = 0 (0x0)
close(8) = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8) = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) = 0 (0x0)
sendmsg(6,{{ AF_INET6 [ff02::1]:58 },28,[{"\M^F\0\0\0@\0\0\M-4\0\0\0\0\0\0"...,120}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xfe,0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x0d,0xb9,0xff,0xfe,0x4a,0x7c,0x02,0x03,0x00,0x00,0x00}}},40,0},0) = 120 (0x78)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 1 (0x1)
recvmsg(6,{{ AF_INET6 [fe80::20d:b9ff:fe4a:7c02]:0 },28,[{"\M^F\0\M^K\M-i@\0\0\M-4\0\0\0\0"...,1500}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xff,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x03,0x00,0x00,0x00}},{level=IPPROTO_IPV6,type=IPV6_HOPLIMIT,data={0xff,0x00,0x00,0x00}}},64,0},0) = 120 (0x78)
__sysctl(0x6f78eb00ac20,0x6,0x0,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ac20,0x6,0x64c2d333000,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x0,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x64c2d333000,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
I am not a BSD guy but the following lines in the output of the broken radvd instance look very suspicious:
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
The Can’t assign requested address is also present in the working radvd truss output from time to time. The Cannot allocate memory’ sounds very fishy though. Maybe it’s an issue with setting up the multicast group?
Environment Software version used and hardware type if relevant.
OPNsense 20.7.2-amd64, openssl APU2C4 Network Intel® I210-AT PPPoE-connected fiber modem (DHCPv6-PD)
I did not experience the issue in OPNsense 20.1.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 8
- Comments: 219 (109 by maintainers)
Commits related to this issue
- net/radvd: debug patch PR: https://github.com/opnsense/core/issues/4338 — committed to opnsense/ports by fichtner 4 years ago
- net/radvd: debug patch PR: https://github.com/opnsense/core/issues/4338 — committed to opnsense/ports by fichtner 4 years ago
A good candidate for this issue is https://github.com/opnsense/src/commit/93e9cefd053b and from the original bug report you can see this affects 12.0-RELEASE, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233683
We’ve been using version 12.1 since OPNsense 20.7 which is when this bug started to happen… 😉
The actual trigger for this bug was to add the existing IPv6 via ifconfig utility again in order to see the bug so I suppose some renew situation trashed the ability of radvd to act in the multicast group.
Cheers, Franco
I am about to issue a PR which will allow you to set a log level for radvd.
The data points are:
There seems to be an issue with list management code in the kernel regarding multicast. Radvd reload behaviour didn’t change as far as I can see and you guys are right that the interfaces do not change as well. Initially we added the join/leave to cope with updates in ravdv 2.x which worked well on 11.x but 12.x seems to be allergic to too many iterations. I don’t see any readily available commit to cherry-pick so this will take a while to find the cause.
Cheers, Franco
Unfortunately this issue has cost us a lot of community support time and we do not see any easy way forward chasing a kernel bug we can’t reproduce any longer. The same issue also affects ISC dhcpd in IPv6 mode but the radvd code was vastly different and even though there was controversy over BSD support I don’t see a reason to blame radvd and its patching any longer. If anything we just made the bug harder to trigger. For more details see https://github.com/opnsense/core/issues/4691
We will target 13.0 which is currently being planned for 22.1 and along the way I hope that this issue simply disappears. If anyone is hoping to have this fixed sooner please find a reliable way to trigger and/or confirm there is a specific patch available in FreeBSD that addresses this.
We are happy to be of further assistance, but as I said not on community support time.
Cheers, Franco
Hi @robel,
Install 20.7 and update to 20.7.8 where fixed, or install 21.1-RC1 where this is already fixed (and which directly updates to 21.1 later this week).
Cheers, Franco
20.7.8 addressed this issue. If you have issues with IPv6 please open tickets with enough relevant info.
Thanks, Franco
radvd isn’t started when the patch is applied. Easiest way for a clean system is:
try d690f93 instead. Remember to reverse the old patch first.
Thank you for your quick reply and hard work. I know you do not do the release notes however at this time shouldn’t it be noted in the release notes that this patch may be needed for ipv6 connectivity.
I have issued a new commit as for some odd reason my local Opnsense repo was screwed up which left me unable to merge cleanly. So with a nice new clean fork of the core and I’ve done a new PR #4461 for anyone who wants to try it. Glad it’s working for those that have tried it, but the same rule apples, no guarantees!
run
opnsense-patch 124cdf6
Hi, i have OPNSense 20.7.3-amd64 running on an IPU672 (https://www.ipu-system.de/produkte/ipu672.html). My ISP is Deutsche Glasfaser which gives IPv4 via CGN (DHCP) and IPv6 with DHCP an /56 prefix delegation.
I experience this issue since 20.7.
I can also confirm this issue with my current OPNsense install:
Data:
I’m currently running 21.1.1 and still seem to be experiencing this same issue where radvd stops announcing prefixes. I have router advertisements set to assisted and after several hours, it stops working. Restarting radvd from the web ui fixes it for a few hours, but this problem then comes back after a few hours.
Please don’t hijack this thread. As I said you have a different issue and you need to provide proper amount of details (see bug report template).
just an update: Radvd has been behaving since I swapped back via the patch. About a week of uptime.
By the way, the patch has been working flawlessly. IPv6 has been working for around 8-11 days now. With the old radvd, it only worked for 3 days and then stopped announcing and working like clockwork.
Thanks!
@ivwang the relevant parts are FreeBSD specific, see https://github.com/opnsense/ports/commit/a5ace74ef2273eeb7 and https://github.com/opnsense/ports/commit/54152320fa817
I don’t think any changes of upstream are at play here. Note the absence of
setsockopt(sock, IPPROTO_IPV6, IPV6_LEAVE_GROUP, &mreq, sizeof(mreq));
in version 2.19 and how people say now it works on FreeBSD 12 even though the code worked with it fine on FreeBSD 11.Isn’t that the same case for radvd? For anything that is not bug-bound beyond the documented purposes?
We are weighing whether or not a long running service works as expected (as per the previous question). Wikipedia says radvd is 25 years old and still maintained. That is roughly the same time frame, isn’t it?
In these regards I see no proof that rtadvd is better than radvd other than rtadvd is probably a better fit for FreeBSD, but that is more a BSD thing than bad for Linux in general. So FreeBSD needs to put functional code in the port to make it work but that doesn’t mean it doesn’t work fine elsewhere?
Every other year we feel our expectations broken by non-functional states in FreeBSD-centric software/implementations. If we switch to rtadvd we have to know it is actually a lot better, but personally I have no data other than this non-representative thread.
Cheers, Franco
What about the assumption that
is wrong?
And, should there ever be a problem with rtadvd you can raise an issue with upstream, because it is - I am repeating myself - an official part of FreeBSD.
radvd is a third party project that explicitly comes with Linux support only.
Problem solved, nothing to do with Opnsense, Sky boxes were causing it.
twenty hours on live system… so far so good… However the logs have flashed up a new issue, I’m seeing RA’s from my IoT vlan appearing on my primary vlan. Checked all my rules, they look good. I’ll reload rtadvd and see if that was the same.
@fichtner I reverted and installed the package you mentioned. All appears to be working for about an hour now. I’ll report back in a day or two. Thank you.
For those who wish to see some useful info try this command from the shell.
rtadvctl -v show
It gives all the useful info, including the last time an RA was sent.
Yes, don’t know why I missed the zero on those, I’d set it for the other set of flags,
Ok… everybody, updated with a new commit 8155f3a. So reverse the original commit ( if applied ) 9a4a908 before applying this new one,
This commit removes the killing of rtadvd which sometimes causes a race issue with its controller. There is in fact no need to kill rtadvd at all, it launches but does nothing until its controller signals it with interface options and a config re-read. This should prevent the strangeness where just saving an interface was killing the daemon. I’ve deleted the commit details for the logging additions as this commit also includes them too.
I am seeing a coredump on rtadvd around 30 seconds after an update from its controller. I have replaced rtadvd with the version from FreeBSD 12.0 and the core dump no longer happens
Set it to Managed or Unmanaged if you are not using dhcpv6, my preference would be Managed, both work, just tested them on a virgin 20.7 installation. Logging is very low even at level 2, rtadvd doesn’t tell you very much at all. Log entries appear in the system log.
I do not understand why it’s missing from your services list on the lobby page, If you install the patch and reboot then it’s there, I’ve reversed the patch and then reapplied the patch and tested on both my live and test systems. After reboot it may show not running, but leave it for twenty seconds or so and then refresh the page it will show running. When stopping and restarting rtadvd it may appear to have frozen the page, it hasn’t though, it is sending out signals to all clients that the route is down, with a lot of clients this can take quite a while, twenty seconds or more on my primary VLAN which has a lot of clients.
I need to add a comment here… The reason rtadvd probably appears to do nothing after patching and manually starting it is that there is a two part process, there is also the daemon controller, it signals the daemon with the interfaces to use and an instruction to re-read the config file, sorry I forgot to mention that bit and it probably explains why it doesn’t always appear to work. So, best to apply the patch and REBOOT!!! If anything changes, i.e. LAN address changes etc then the controller will signal the daemon to reload the config. However, the controller only does its thing either at start-up or something causes the rtadvd configure process to be called.
You need to do none of that, Debug settings are available om Interfaces->Settings, I meant a reboot after installing the patch, just for cleanliness. I need more info than ‘it doesn’t work’, it does work or many of us would not be using it. Can you give more info, e.g. are you seeing GUA addresses on the WAN and LAN, are you using Manual Configuration on the tracking interface(s)? Have you looked at the rtadvd.conf file to see what the configuration is? If you post said file here it here it may help us work out why it appears not to be working for you.
I destroyed my HA config and it’s operating just fine now.
radvd is still broken, but the rtadvd patch still works.
That’s pretty odd, Just completed a clean install of 20.7 and bounced to 20.75, and you’re right, rtadvd only starts manually. I’ll look into it later. Comments about it not working with VLANs are false, I’ve got it running on my primary system with three VLANs and it’s working fine on that one. Again, might be a 20.7.5 thing, I’ll check that later too.
Same here. Uptime is now 31 days since my first patch with no issues.
No problem and understood. One thing I also wanted to point out is that I saw OpenBSD moved away from rtadvd and built there own daemon called rad. Not sure if there was a reason but wanted to mention that in case there was a good reason and nobody saw that. Thank you both for the hard work on this. If you need any more testing, I’d be happy to help in this regard.
It was pulled together pretty quickly, @fichtner has taken it to run with. Pretty sure there will be lots of things he’ll want to change, but it appears to work fine on my system. Just need to be warned that this a WIP is by no means guaranteed!
radvd
misbehaved again last night. It seems like mycan't join ipv6-allrouters
log message that I mentioned in this comment might just be a red herring - it didn’t happen this time and router advertisements stopped.BTW, if anyone is looking for a simple Bash one-liner to monitor when router advertisements stop being received, you can try this:
This will run
radvdump
(which runs continuously) and keeps reading its output, but theread
will timeout after300
seconds. Feel free to replace theecho
statement with something better, e.g. sending a ping tohealthchecks.io
or similar. 😄Seems you take your hair style more serious than OPNsense 😉
@zzyonn please kindly follow the comments here
On 20.1.9 (two instances, APU2, Config: Unmanaged or Stateless), I can’t confirm the bug. Even after a long uptime (> 30 days) everything is fine.
Asserting the same issue in 20.1 is speculation without the appropriate data points to support this.
While it’s annoying, please refrain from telling how annoying this is for the sake of keeping this technical and on point.
Cheers, Franco
I have a similar issue where radvd is not responding to solicitations directly, but doesn’t seem to fail at sending unsolicited advs. So the situation is a host solicits and gets nothing back, finally once the unsolicited interval triggers, the host can then establish it’s ipv6 address and connect. That creates a delay others reported in how soon a host can establish it’s connection.
The other symptom is on a cold boot, there is no ipv6 connectivity until re-saving both the wan interface settings followed by re-saving lan interface settings (this happened recently when I noticed nagios was failing an ipv6 ping for 90 minutes). This all seems to be related. What I haven’t tested is after cold start whether restarting radvd also solves the issue.