openthread: Router can seem to hold onto invalid eidcache for node resulting in packet drop
Describe the bug I’m seeing a case where a BR is holding on to a stale eidcache
> eidcache
fd2f:2067:77a8:1:1058:fc0e:1593:be16 a000 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:f22f:fba6:1516:7836
fd2f:2067:77a8:1:c26f:97e:4b45:38b8 6c00 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:2c41:66e6:2550:2852
fd2f:2067:77a8:1:6280:ce57:a63f:46a 2800 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:3bed:708c:6450:8cfd
fd2f:2067:77a8:1:3d32:5604:767d:aae3 0c00 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:5de9:f10e:2f2:e038
fd2f:2067:77a8:1:b856:fcfa:4589:2041 1400 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:1e12:7b6d:8fb7:8e23
fd2f:2067:77a8:1:afa6:c0dd:9b4:7de8 8405 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:d14e:bb38:b45c:3534
fd2f:2067:77a8:1:65da:8596:83ac:b473 b407 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:b24:8d26:b41e:8c7e
fd2f:2067:77a8:1:d95b:4002:5e97:5977 e400 cache canEvict=1
fd2f:2067:77a8:1:8ab3:9c5c:da27:d2c2 b800 cache canEvict=1
fd2f:2067:77a8:1:b551:fd41:be70:c402 8402 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:cc3c:21ad:e1d6:e50d
fd2f:2067:77a8:1:89ea:9674:c15a:104d 8404 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:35dc:3a10:9444:4735
fd2f:2067:77a8:1:3556:8c33:2bf8:e1f5 8403 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:332d:2213:bee4:891c
fd2f:2067:77a8:1:4cd2:9fca:97f:123 ac00 cache canEvict=1
fd2f:2067:77a8:1:aee3:de98:cefb:2e7e b400 cache canEvict=1
fd2f:2067:77a8:1:f9aa:9b28:50ab:79d3 e800 cache canEvict=1
fd2f:2067:77a8:1:428e:9739:356f:fce4 4c00 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:6cf2:8b1b:862:7158
fd2f:2067:77a8:1:b3f0:3f3c:add3:84ba 8400 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:a337:8500:af27:6248
fd2f:2067:77a8:1:2dcd:812b:cf3d:1094 8400 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:d14c:296d:dc47:1fe1
fd2f:2067:77a8:1:d50c:81d1:c8d5:169b a001 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:5ed9:5c3d:e094:8249
fdcf:8d62:8570:0:5b47:c1f6:474a:9a70 8401 cache canEvict=1
fd2f:2067:77a8:1:9c52:ea6f:df6d:1cb bc00 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:869c:644d:1efb:5728
fd2f:2067:77a8:1:4c72:3c36:3f42:3cc7 1800 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:bf89:3582:da9e:1381
fd2f:2067:77a8:1:f872:74a4:fa8d:4682 8401 cache canEvict=1 transTime=0 eid=fdcf:8d62:8570:0:5b47:c1f6:474a:9a70
fd2f:2067:77a8:1:2b03:5960:d39a:6583 fffe retry canEvict=1 timeout=0 retryDelay=30
Done
> router table
| ID | RLOC16 | Next Hop | Path Cost | LQ In | LQ Out | Age | Extended MAC | Link |
+----+--------+----------+-----------+-------+--------+-----+------------------+------+
| 3 | 0x0c00 | 33 | 1 | 1 | 1 | 7 | e62bf306b0f70c4f | 1 |
| 5 | 0x1400 | 33 | 2 | 2 | 3 | 7 | c66e1574d2735c85 | 1 |
| 6 | 0x1800 | 33 | 1 | 2 | 2 | 15 | 3ae75f8b8a91c239 | 1 |
| 10 | 0x2800 | 33 | 1 | 1 | 2 | 26 | dea9d8ec0225a0ff | 1 |
| 20 | 0x5000 | 63 | 0 | 0 | 0 | 0 | 0000000000000000 | 0 |
| 27 | 0x6c00 | 46 | 3 | 1 | 1 | 25 | 06d838f00155ca5d | 1 |
| 33 | 0x8400 | 51 | 1 | 3 | 3 | 3 | d2483ee530c6b41f | 1 |
| 36 | 0x9000 | 51 | 4 | 0 | 0 | 19 | 8af4955487e9f116 | 0 |
| 40 | 0xa000 | 51 | 1 | 1 | 1 | 24 | 727e27068c33dd2a | 1 |
| 43 | 0xac00 | 33 | 2 | 1 | 2 | 86 | 06eb0dc305d53eca | 1 |
| 45 | 0xb400 | 33 | 2 | 1 | 2 | 12 | 7299cf87474fffcc | 1 |
| 46 | 0xb800 | 33 | 2 | 2 | 2 | 15 | 16847a5afb1782f9 | 1 |
| 47 | 0xbc00 | 33 | 2 | 1 | 2 | 27 | 1af3a011a1df3784 | 1 |
| 51 | 0xcc00 | 33 | 1 | 3 | 3 | 11 | baf737a9e2c4ee08 | 1 |
| 57 | 0xe400 | 33 | 2 | 1 | 2 | 31 | 8ef6d4853cef4748 | 1 |
| 58 | 0xe800 | 33 | 2 | 1 | 2 | 23 | ae5014e64a8276b1 | 1 |
Address in question is fd2f:2067:77a8:1:428e:9739:356f:fce4 and any packets coming through this node are dropped due to noroute, which i presume is because the rloc in the cache, 0x4c00 is not in the router table. From other BRs, the node has moved to 0x9000 It seems to me, the absense of the node in the router table should remove the entry from the cache and force an address query on next packet, but that does not seem to be happening:
> ping fd2f:2067:77a8:1:428e:9739:356f:fce4
1 packets transmitted, 0 packets received. Packet loss = 100.0%.
Done
...
Sat Sep 9 11:47:19 2023 user.notice otbr-agent[1468]: 20:48:42.931 [N] MeshForwarder-: Dropping IPv6 ICMP6 msg, len:56, chksum:4cd5, ecn:no, sec:yes, error:NoRoute, prio:normal
Sat Sep 9 11:47:19 2023 user.notice otbr-agent[1468]: 20:48:42.931 [N] MeshForwarder-: src:[fd2f:2067:77a8:1:18d3:f1f8:f5b4:b0d]
Sat Sep 9 11:47:19 2023 user.notice otbr-agent[1468]: 20:48:42.931 [N] MeshForwarder-: dst:[fd2f:2067:77a8:1:428e:9739:356f:fce4]
To Reproduce Information to reproduce the behavior, including:
- Git commit id:
34ecac8536f6a8e23391b7f25b7ec401bf1ae305 - IEEE 802.15.4 hardware platform silabs mg21 rcp
- Build steps
- Network topology: ~20 devices.
Expected behavior A clear and concise description of what you expected to happen.
Console/log output If applicable, add console/log output to help explain your problem.
Additional context Add any other context about the problem here.
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 16 (16 by maintainers)
See SPEC-1164 regarding the current implementation.