esp-idf: IPv6 local-link address becomes invalid without a real duplicate node by DAD (IDFGH-3898)
Environment
- Development Kit: ESP32-DevKitC
- Kit version : v1
- Module or chip used: ESP32-WROOM-32 / ESP32D0WDQ6 (revision 1)
- IDF version : v3.3.2
- Build System: idf.py
- Compiler version : xtensa-esp32-elf-gcc (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a5) 5.2.0
- Operating System: Windows7
- (Windows only) environment type: Plain Command Prompt(configured with esp-idf/export.bat)
- Using an IDE?: No
- Power Supply: USB
- Router used: Huawei HG8145V (No IPv6 support by ISP, IPv6 default configuration enabled in router)
Problem Description
Using the tcp_client example application with the configuration of SSID provided by the above-mentioned router and an Android mobile connected to the same SSID, the application is not able to create a socket and waiting for IPv6 address. The IPv6 link-local address becomes invalid while performing Duplicate Address Detection (DAD) algorithm even though when there are no real duplicate nodes available in the network. This happens only in the case when any other device with IPv6 support is connected in the same network. This issue is not observed when the device is alone in the network.
Note: Running the TCP server for this example is not required, as the intention is to check if the IPv6 local-link address is valid.
Expected Behavior
IPv6 link-local should not become invalid when there is no real duplicate addressed node present in the network.
The state machine to create a socket connection should not stall if the IPv6 LLA becomes invalid. Instead, a recovery mode should be engaged to handle this condition and allow it to proceed with IPv4 only (when IPv4 is configured in the menuconfig).
Actual Behavior
The application got IPv4 address and waiting to get the IPv6 address. IPv6 link-local became invalid due to DAD processing when an Android device/Laptop(Windows 10) present in the same network(SSID).
Steps to reproduce
This issue is observed and reproduced in the setup environment described above.
- Connect an Android mobile in the WiFi network
- Run tcp_client menuconfig and set the SSID and password same as the network where Android mobile is connected and keep other configurations as default
- Build and run the tcp_client application
- Monitor the logs in the console
Code to reproduce this issue
The tcp_client application without any changes.
Debug Logs
There are two cases described here as Failure and Success. In both cases, the device is connected to the same network(SSID). The difference is only the presence of other/neighbor node in the same SSID.
Failure case After a successful connection to the router, I can see that the device is getting the IPv4 address and waiting for the IPv6 address. After enabling the NETIF, IP and IP6 debug logs of LWIP, I can see in the logs that after a successful connection to AP, it is setting an IPv6 local-link address with the state as tentative to perform Duplicate Address Detection(DAD).
D (2308) event: SYSTEM_EVENT_STA_CONNECTED, ssid:IPTest, ssid_len:6, bssid:a2:f4:79:1c:77:0c, channel:1, authmode:3
.
.
.
0x4012a1f8: tcpip_adapter_create_ip6_linklocal_api at C:/esp32/esp-idf/components/tcpip_adapter/tcpip_adapter_lwip.c:1126
netif_ip6_addr_set_state: netif address state being changed
netif: IPv6 address 0 of interface st set to FE80::260A:C4FF:FEAE:AA58/0x08
V (2568) tcpip_adapter: call api in lwip: ret=0x0, give sem
V (2578) tcpip_adapter: check: remote, if=0 fn=0x4012a1f8
0x4012a1f8: tcpip_adapter_create_ip6_linklocal_api at C:/esp32/esp-idf/components/tcpip_adapter/tcpip_adapter_lwip.c:1126
On the initial call of nd6_tmr() nd6.c, it changes the tentative IPv6 address to the next state of Tentative and sending NS(Network Solicitation) message.
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 32 | 0 | 1 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 1 | ffae | aa58 |
+-------------------------------+
netif->output_ip6()
D (3538) phy_init: wifi mac time delta: 34633
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 24 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 1 | ffae | aa58 |
+-------------------------------+
netif->output_ip6()
netif_ip6_addr_set_state: netif address state being changed
netif: IPv6 address 0 of interface st set to FE80::260A:C4FF:FEAE:AA58/0x09
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 8 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 0 | 0 | 2 |
+-------------------------------+
netif->output_ip6()
I can see that in the logs, the device receives IPv6 packets for NS message which seems to be the same as what it has been sent.
ip6_input: packet with Hop-by-Hop options header
ip6_input:
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 32 | 0 | 1 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 1 | ffae | aa58 |
+-------------------------------+
ip6_input: p->len 72 p->tot_len 72
ip6_input:
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 24 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 1 | ffae | aa58 |
+-------------------------------+
ip6_input: p->len 64 p->tot_len 64
D (3848) phy_init: wifi mac time delta: 17079
The device processes this NS message and made the local-link IPv6 address to an invalid state.
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 32 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
| fe80 | 0 | 0 | 0 | (src)
| 260a | c4ff | feae | aa58 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 0 | 0 | 1 |
+-------------------------------+
netif->output_ip6()
netif_ip6_addr_set_state: netif address state being changed
netif: IPv6 address 0 of interface st set to FE80::260A:C4FF:FEAE:AA58/0x00
ip6_input: packet with src ANY_ADDRESS dropped
ip6_input: packet not for us.
ip_input: iphdr->dest 0xffffffff netif->ip_addr 0x0 (0x0, 0x0, 0xffffffff)
ip_input: iphdr->dest 0xffffffff netif->ip_addr 0x100007f (0xff, 0x7f, 0xffffff00)
After some time, the device got the IPv4 address from the router through DHCP.
netif: netmask of interface st set to 255.255.255.0
netif: GW address of interface st set to 192.168.1.1
netif_set_ipaddr: netif address being changed
D (5568) phy_init: wifi mac time delta: 13232
netif: IP address of interface st set to 192.168.1.3
D (5578) tcpip_adapter: if0 dhcpc cb
D (5578) tcpip_adapter: if0 ip changed=1
D (5578) event: SYSTEM_EVENT_STA_GOT_IP, ip:192.168.1.3, mask:255.255.255.0, gw:192.168.1.1
V (5588) event: enter default callback
I (5598) event: sta ip: 192.168.1.3, mask: 255.255.255.0, gw: 192.168.1.1
V (5598) event: exit default callback
I (5608) example: SYSTEM_EVENT_STA_GOT_IP
Now, the device got the IPv4 address, it is still waiting to get an IPv6 address to proceed to create a socket connection. As the IPv6 link-local became invalid, it is not getting a callback for local-link IPv6 and as the router is not supporting IPv6 by ISP, it is not getting the callback for that either.
Success Case: After a successful connection to the router, I can see that the device is getting the IPv4 and IPv6 addresses and Created a socket to connect to the server. After enabling the NETIF, IP and IP6 debug logs of LWIP, I can see in the logs that after a successful connection to AP, it is setting an IPv6 local-link address with the state as tentative to perform Duplicate Address Detection(DAD).
D (2308) event: SYSTEM_EVENT_STA_CONNECTED, ssid:IPTest, ssid_len:6, bssid:a2:f4:79:1c:77:0c, channel:1, authmode:3
.
.
.
0x4012a1f8: tcpip_adapter_create_ip6_linklocal_api at C:/esp32/esp-idf/components/tcpip_adapter/tcpip_adapter_lwip.c:1126
netif_ip6_addr_set_state: netif address state being changed
netif: IPv6 address 0 of interface st set to FE80::260A:C4FF:FEAE:AA58/0x08
V (2568) tcpip_adapter: call api in lwip: ret=0x0, give sem
V (2578) tcpip_adapter: check: remote, if=0 fn=0x4012a1f8
0x4012a1f8: tcpip_adapter_create_ip6_linklocal_api at C:/esp32/esp-idf/components/tcpip_adapter/tcpip_adapter_lwip.c:1126
On the initial call of nd6_tmr() nd6.c, it changes the tentative IPv6 address to the next state and sending NS(Network Solicitation) message.
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 32 | 0 | 1 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 1 | ffae | aa58 |
+-------------------------------+
netif->output_ip6()
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 24 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 1 | ffae | aa58 |
+-------------------------------+
netif->output_ip6()
netif_ip6_addr_set_state: netif address state being changed
netif: IPv6 address 0 of interface st set to FE80::260A:C4FF:FEAE:AA58/0x09
ip6_output_if: st1
IPv6 header:
+-------------------------------+
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 8 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
| 0 | 0 | 0 | 0 | (src)
| 0 | 0 | 0 | 0 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 0 | 0 | 2 |
+-------------------------------+
netif->output_ip6()
In this case, I did not see the packets of IPv6 NS coming as input which is there in case of failure. On the next call of nd6_tmr(), It changed the IPv6 link-local address state to preferred. This notifies further as got valid IPv6 address.
netif_ip6_addr_set_state: netif address state being changed
netif: IPv6 address 0 of interface st set to FE80::260A:C4FF:FEAE:AA58/0x30
D (4758) event: SYSTEM_EVENT_AP_STA_GOT_IP6 address fe80:0000:0000:0000:260a:c4ff:feae:aa58
ip6_output_if: st1
I (4758) example: SYSTEM_EVENT_STA_GOT_IP6
IPv6 header:
+-------------------------------+
I (4768) example: IPv6: FE80::260A:C4FF:FEAE:AA58
D (4778) phy_init: wifi mac time delta: 39059
| 6 | 0 | 0 | (ver, class, flow)
+-------------------------------+
| 16 | 58 | 255 | (plen, nexth, hopl)
+-------------------------------+
I (4758) example: Connected to AP
| fe80 | 0 | 0 | 0 | (src)
| 260a | c4ff | feae | aa58 |
+-------------------------------+
| ff02 | 0 | 0 | 0 | (dest)
| 0 | 0 | 0 | 2 |
+-------------------------------+
netif->output_ip6()
It has got IPv4 address also by the meantime and created a client socket.
I (4758) example: Connected to AP
new *mbox ok mbox=0x3ffba498 os_mbox=0x3ffc61e8
set mbox=0x3ffba498 owner=0x3ffba134sem_get s=0x3ffaffc0
sys_mutex_new: m=0x3ffc62a8
alloc_socket: alloc 0 ok
I (4838) example: Socket created
Analysis:
-
The DAD algo has complications due to a loopback problem that is inherent in multicast messages – determining whether a received multicast solicitation was looped back to the sender or actually came from another node. See Appendix A of RFC 4862 for details.
-
There are ways to resolve the loopback problem for DAD and are implementation specific. An example is in RFC 7527.
-
It appears this WiFI router handles the loopback problem differently depending on what is connected to its local network – when an Android Mobile device is connected, it loops back the NS message and when it’s not, it suppresses the NS message.
-
When the router loops back the NS message, it confuses the DAD algorithm and it thinks it’s a duplicate address, marking it as invalid.
Proposed resolutions:
-
Loopback detection – The Espressif DAD algorithm could detect that its NS message was received and just looped back from the router. When it sends out the NS, and while it’s waiting for an NA message, it should count the number of NS messages it receives. If it’s <= to the number sent, it should ignore them.
-
Continue with IPv4 – If the Espressif DAD algorithm continues to detect duplicate addresses, it should allow the socket to be created using just an IPv4 connection. This would prevent the state machine from stalling.
-
Retry using a different IPv6 LLA – If the EUI-64 method results in a duplicate address detected, then the LLA should be regenerated using a random value.
Other items
- sdkconfig.txt, changed extension to .txt for upload compatiability
- elf file tcp_client.zip
Debug Logs(Verbose Level enabled):
Debug Logs (NETIF, IP, IP6, LWIP debug enabled):
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (3 by maintainers)
Thanks for confirming @espxiehang.
I have used the official release tag of v4.2 https://github.com/espressif/esp-idf/releases/tag/v4.2. Based on the commit reference. Yes, it is already part of v4.2 and it is working as expected.
Thanks again @espxiehang