ovn: Load balancer ARP responder broken since 21.06

The LXD team use OVN load balancers for its Network forward functionality (https://linuxcontainers.org/lxd/docs/master/howto/network_forwards/).

We have had a user report (since confirmed) from @lepokle that recent versions of OVN have broken the ARP responder functionality of OVN load balancers.

@lepokle has identified that it appears to broken since 21.06 specifically the commit https://github.com/ovn-org/ovn/commit/ea6ee901ff9107a084bc830a8a38c4e0bd9f75f7

We no longer see the OVN logical router responding to ARP requests for the load balancer listen IP on the chassis’s gateway port. If packets are manually routed to the OVN logical router’s gateway port IP then the load balancer does still work though, so its just the ARP responder that has been broken.

On a pre-21.06 OVN version we see the following logical flows configured with a load balancer:

# ovn-nbctl --version                                                                                               
ovn-nbctl 21.03.0
Open vSwitch Library 2.15.90
DB Schema 5.31.0

# ovn-sbctl list logical_flow | grep bbb.76.20.84                                                                   
match               : "ct.est && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions             : "eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa = arp.spa; arp.spa = bbb.76.20.84; outport = inport; flags.loopback = 1; output;"
match               : "inport == \"lxd-net11-lr-lrp-int\" && arp.op == 1 && arp.tpa == bbb.76.20.84"
match               : "ip && ip4.dst == bbb.76.20.84"
actions             : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match               : "ct.new && ip4.dst == bbb.76.20.84"
match               : "ct.new && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions             : "eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa = arp.spa; arp.spa = bbb.76.20.84; outport = inport; flags.loopback = 1; output;"
match               : "inport == \"lxd-net11-lr-lrp-ext\" && arp.op == 1 && arp.tpa == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"

Specifically there is a mention of an /* ARP reply */ flow.

However in later versions we see the ARP reply flow is gone:

# ovn-nbctl --version
ovn-nbctl 21.06.0
Open vSwitch Library 2.15.90
DB Schema 5.32.0

# ovn-sbctl list logical_flow | grep bbb.76.20.84                                                                   
match               : "ct.est && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
match               : "ip && ip4.dst == bbb.76.20.84"
match               : "inport == \"lxd-net11-lr-lrp-int\" && arp.op == 1 && arp.tpa == { bbb.76.20.84 }"
actions             : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match               : "ct.new && ip4.dst == bbb.76.20.84"
match               : "inport == \"lxd-net11-lr-lrp-ext\" && arp.op == 1 && arp.tpa == { bbb.76.20.84 } && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
match               : "ct.new && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24

Commits related to this issue

Most upvoted comments

Hi @dceara, thank you very much for your patch! I’ve just tried it out in our environment and it works perfectly! Thank you very much! @tomponline I’ve configured as suggested in patch via ovn-nbctl. Is there any chance that this option will be overwritten by LXD?

Thanks!

Ah I misunderstood your proposal of a single config option, I like it better!

Good point, will do. I just need to figure out if it should be “neighbour” or “neighbor” 😃. We have both in the code base but we mostly use “neighbor” from what I can tell.

ARP is an acronym so using NDP would make sense to me (NB.Load_Balancer.options:ndp_responder).

Thanks for looking at this.

I think I would prefer the NB.Load_Balancer.options:arp_responder option as it then means its a property of the load balancer. We don’t tend to share load balancers between routers so the latter option wouldn’t provide us any benefit.

Makes sense. I know ovn-kubernetes shares load balancers across routers but I don’t see why they would need different behavior for the same load balancer.

Whilst we are on the subject, I haven’t checked, but did the change also prevent IPv6 NDP responders from working in the same way that IPv4 ARP responders were stopped?

Yes.

If not, then they probably should be for consistency, and then perhaps instead of NB.Load_Balancer.options:arp_responder th new option should be called NB.Load_Balancer.options:neighbour_responder to accommodate both IPv4 and IPv6 responders?

Good point, will do. I just need to figure out if it should be “neighbour” or “neighbor” 😃. We have both in the code base but we mostly use “neighbor” from what I can tell.