ovn: Load balancer ARP responder broken since 21.06
The LXD team use OVN load balancers for its Network forward functionality (https://linuxcontainers.org/lxd/docs/master/howto/network_forwards/).
We have had a user report (since confirmed) from @lepokle that recent versions of OVN have broken the ARP responder functionality of OVN load balancers.
@lepokle has identified that it appears to broken since 21.06 specifically the commit https://github.com/ovn-org/ovn/commit/ea6ee901ff9107a084bc830a8a38c4e0bd9f75f7
We no longer see the OVN logical router responding to ARP requests for the load balancer listen IP on the chassis’s gateway port. If packets are manually routed to the OVN logical router’s gateway port IP then the load balancer does still work though, so its just the ARP responder that has been broken.
On a pre-21.06 OVN version we see the following logical flows configured with a load balancer:
# ovn-nbctl --version
ovn-nbctl 21.03.0
Open vSwitch Library 2.15.90
DB Schema 5.31.0
# ovn-sbctl list logical_flow | grep bbb.76.20.84
match : "ct.est && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions : "eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa = arp.spa; arp.spa = bbb.76.20.84; outport = inport; flags.loopback = 1; output;"
match : "inport == \"lxd-net11-lr-lrp-int\" && arp.op == 1 && arp.tpa == bbb.76.20.84"
match : "ip && ip4.dst == bbb.76.20.84"
actions : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match : "ct.new && ip4.dst == bbb.76.20.84"
match : "ct.new && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions : "eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa = arp.spa; arp.spa = bbb.76.20.84; outport = inport; flags.loopback = 1; output;"
match : "inport == \"lxd-net11-lr-lrp-ext\" && arp.op == 1 && arp.tpa == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
Specifically there is a mention of an /* ARP reply */
flow.
However in later versions we see the ARP reply flow is gone:
# ovn-nbctl --version
ovn-nbctl 21.06.0
Open vSwitch Library 2.15.90
DB Schema 5.32.0
# ovn-sbctl list logical_flow | grep bbb.76.20.84
match : "ct.est && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
match : "ip && ip4.dst == bbb.76.20.84"
match : "inport == \"lxd-net11-lr-lrp-int\" && arp.op == 1 && arp.tpa == { bbb.76.20.84 }"
actions : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match : "ct.new && ip4.dst == bbb.76.20.84"
match : "inport == \"lxd-net11-lr-lrp-ext\" && arp.op == 1 && arp.tpa == { bbb.76.20.84 } && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
match : "ct.new && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 24
Commits related to this issue
- nb: Add Load_Balancer.options:neighbor_responder knob. This allows CMS to tweak the way logical routers reply to ARP/ND packets targeting load balancer VIPs. By default a router only replies for VIP... — committed to dceara/ovn by dceara 2 years ago
- nb: Add Load_Balancer.options:neighbor_responder knob. This allows CMS to tweak the way logical routers reply to ARP/ND packets targeting load balancer VIPs. By default a router only replies for VIP... — committed to ovsrobot/ovn by dceara 2 years ago
- nb: Add Load_Balancer.options:neighbor_responder knob. This allows CMS to tweak the way logical routers reply to ARP/ND packets targeting load balancer VIPs. By default a router only replies for VIP... — committed to ovn-org/ovn by dceara 2 years ago
- tests/network-ovn: drop IPv4 part for OVN bug workaround https://github.com/ovn-org/ovn/issues/124 was marked as closed with OVN versions 22.06, 22.09 and 22.12. LXD has been shipping with OVN 22.09 ... — committed to simondeziel/lxd-ci by simondeziel 5 months ago
- tests/network-ovn: drop IPv4 part for OVN bug workaround https://github.com/ovn-org/ovn/issues/124 was marked as closed with OVN versions 22.06, 22.09 and 22.12. LXD has been shipping with OVN 22.09 ... — committed to simondeziel/lxd-ci by simondeziel 5 months ago
- tests/network-ovn: drop IPv4 part for OVN bug workaround https://github.com/ovn-org/ovn/issues/124 was marked as closed with OVN versions 22.06, 22.09 and 22.12. LXD has been shipping with OVN 22.09 ... — committed to simondeziel/lxd-ci by simondeziel 5 months ago
- tests/network-ovn: drop IPv4 part for OVN bug workaround https://github.com/ovn-org/ovn/issues/124 was marked as closed with OVN versions 22.06, 22.09 and 22.12. LXD has been shipping with OVN 22.09 ... — committed to simondeziel/lxd-ci by simondeziel 5 months ago
Hi @dceara, thank you very much for your patch! I’ve just tried it out in our environment and it works perfectly! Thank you very much! @tomponline I’ve configured as suggested in patch via ovn-nbctl. Is there any chance that this option will be overwritten by LXD?
Thanks!
Ah I misunderstood your proposal of a single config option, I like it better!
ARP is an acronym so using NDP would make sense to me (
NB.Load_Balancer.options:ndp_responder
).Makes sense. I know ovn-kubernetes shares load balancers across routers but I don’t see why they would need different behavior for the same load balancer.
Yes.
Good point, will do. I just need to figure out if it should be “neighbour” or “neighbor” 😃. We have both in the code base but we mostly use “neighbor” from what I can tell.