metallb: Accessing service IPs from nodes (uncontained) fails in some setups

There have been multiple reports of being unable to reach a service IP when dialing out from a node in the cluster (i.e. not within a container). I was unable to reproduce this originally, but multiple independent reports suggests this may be a bug in a particular version of k8s, or interaction with a specific network addon.

Setups where this has appeared:

k8s=1.9.3, network_addon=canal, protocol=bgp
k8s=1.9.5, network_addon=calico, protocol=bgp

A known workaround is to add the following rule to iptables, in addition to those generated by kube-proxy and the network addon:

iptables -A OUTPUT -t nat -d <your pool cidr> -m mark --mark 0x8000 -j MARK --set-xmark 0x0 -m comment --comment "Ensuring K8s nodes can reach MetalLB-announced service IPs. WORKAROUND"

This strongly suggests a bug in kube-proxy, since iirc it’s the one that uses mark+action to do a bunch of policy decisions.

Next steps: attempt to reproduce this, trace iptables, file upstream bug.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 3
Comments: 16 (2 by maintainers)

Most upvoted comments

I understand why @jpiper fix is doing the job, forcing endpoints generation and acting the same way as clusterIP on nodes not running the workloads but It should work even better if no virtual server was created at all for services with externalTrafficPolicy: Local and exactly those nodes, this way, BGP routing would do its job properly…

hightoxicity on Oct 15, 2019

Further data: the failing services all use externalTrafficPolicy: Local. I suspect this is kube-proxy confusing locally originated and externally originated traffic, and applying the “must blackhole if no local pods” semantics to locally originated traffic (incorrectly).

If I’m right, this should be an easy upstream bug to fix. Still need to repro and investigate more.

danderson on May 8, 2018