cilium: kube-proxy replacement: LoadBalancer traffic fails from host back to same host

Bug report

General Information

  • Cilium version (run cilium version)
Client: 1.7.4 c7ee6d62b 2020-05-15T16:07:35+02:00 go version go1.13.10 linux/amd64
Daemon: 1.7.4 c7ee6d62b 2020-05-15T16:07:35+02:00 go version go1.13.10 linux/amd64
  • Kernel version (run uname -a)
Linux test03.lan 5.6.2-1.el7.elrepo.x86_64 #1 SMP Thu Apr 2 10:55:54 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Orchestration system version in use (e.g. kubectl version, Mesos, …)
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:52:00Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Link to relevant artifacts (policies, deployments scripts, …)
# automatically restarts pods to ensure controlled by new CNI driver
operator:
  enabled: true

global:
  k8sServiceHost: "127.0.0.1"
  k8sServicePort: "6443"
  enableXTSocketFallback: false

  prometheus:
    enabled: true

  bpf:
    preallocateMaps: true

  # disabling not ready for primetime yet
  # https://github.com/cilium/cilium/projects/93#column-7748410
  installIptablesRules: true

  # https://docs.cilium.io/en/latest/architecture/#arch-guide
  # https://cilium.io/blog/2019/02/12/cilium-14/#sockmap-bpf-based-sidecar-acceleration-alpha
  # https://www.youtube.com/watch?v=ER9eIXL2_14
  sockops:
    enabled: true

  k8s:
    # cilium pods will not start on node until pod CIDR has been assigned
    requireIPv4PodCIDR: true

  # eliminates need for any kind of BGP stuff
  # automatically addes routes to each node
  autoDirectNodeRoutes: true

  tunnel: disabled
  kubeProxyReplacement: strict
  hostServices:
    enabled: true
  nodePort:
    enabled: true
    # dsr or snat
    #mode: dsr
    mode: snat
  externalIPs:
    enabled: true

  # dev purposes only
  cleanState: false
  cleanBpfState: true
  • Upload a system dump (run curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zip and then attach the generated zip file)

How to reproduce the issue

I’m using cilium with metallb with the kube-proxy replacement. I’ve got a pretty big matrix of scenarios I’m testing and most of them work, but we’ve found a situation where certain traffic fails to be handled. I believe I can distill the issue down to: when traffic leaves N1 (Node 1) and comes back to N1P (Pod running on Node 1) without an snat involved it fails

I’ve tried this with both dsr mode (my intended target) and snat mode (less interested, but wanted to try it out). Both fail under the above circumstances. Her’s a pretty crude representation of what I think the traffic flows are and what works and what fails:

GW = gateway R = router NX = node X NXP = pod running on node X

dsr mode

# service with Cluster externalTrafficPolicy
N1 -> GW -> R -> N2 -> N1P: fail
N1 -> GW -> R -> N1 -> N1P: fail
N2 -> GW -> R -> N1 -> N1P: success

# service with Local externalTrafficPolicy
N2 -> GW -> R -> N1 -> N1P: success
N1 -> GW -> R -> N1 -> N1P: fail

snat mode

# service with Cluster externalTrafficPolicy
N1 -> GW -> R -> N2 -> N1P: success
N1 -> GW -> R -> N1 -> N1P: fail

# service with Local externalTrafficPolicy
N2 -> GW -> R -> N1 -> N1P: success
N1 -> GW -> R -> N1 -> N1P: fail

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 38 (37 by maintainers)

Commits related to this issue

Most upvoted comments

Sorry above comments about source IP are complete noise…I thought I was running in dsr but was not 😦

I’ll go through a deeper round of testing with snat + Local now and report observations.

Yeah that’s correct. It’s deployed as daemon set so unless something is off it should be with dsr yes.

I’ll kick the agents pods in a bit just to confirm it still is in a bad state.

Yeah, I have a bigger matrix to go though still (non-related Pods running on N1 and N2 etc hitting LB IP, Cluster IP, etc) when Local is fully up…but here’s what I’ve tested so far (sorry for the crude syntax, let me know if something doesn’t make sense).

dsr + cluster (1 Pod backing service running on N1)
(all work)

N1 -> N1P
N1 -> 127.0.0.1 nodeport
N1 -> LB IP -> N1P
N1 -> SVC Cluster IP -> N1P

N2 -> N1P
N2 -> 127.0.0.1 nodeport -> N1P (should fail in Local policy but not cluster)
N2 -> N1 nodeport -> N1P
N2 -> LB IP -> N1P
N2 -> SVC Cluster IP -> N1P

EXT -> LB IP -> N1P
EXT -> N1 nodeport -> N1P
EXT -> N2 nodeport -> N1P (should fail in Local policy but not cluster)





dsr + local (1 Pod backing service running on N1)

N1 -> N1P
N1 -> 127.0.0.1 nodeport
N1 -> LB IP -> N1P (fail)
N1 -> SVC Cluster IP -> N1P

N2 -> N1P
N2 -> 127.0.0.1 nodeport -> N1P (should fail in Local policy but not cluster)
N2 -> N1 nodeport -> N1P
N2 -> LB IP -> N1P
N1 -> SVC Cluster IP -> N1P

EXT -> LB IP -> N1P
EXT -> N1 nodeport -> N1P
EXT -> N2 nodeport -> N1P (should fail in Local policy but not cluster)










snat + cluster (1 Pod backing service running on N1)
(all work)

N1 -> N1P
N1 -> 127.0.0.1 nodeport
N1 -> LB IP -> N1P
N1 -> SVC Cluster IP -> N1P

N2 -> N1P
N2 -> 127.0.0.1 nodeport -> N1P (should fail in Local policy but not cluster)
N2 -> N1 nodeport -> N1P
N2 -> LB IP -> N1P
N2 -> SVC Cluster IP -> N1P

EXT -> LB IP -> N1P
EXT -> N1 nodeport -> N1P
EXT -> N2 nodeport -> N1P (should fail in Local policy but not cluster)





snat + local (1 Pod backing service running on N1)

N1 -> N1P
N1 -> 127.0.0.1 nodeport
N1 -> LB IP -> N1P (fail)
N1 -> SVC Cluster IP -> N1P

N2 -> N1P
N2 -> 127.0.0.1 nodeport -> N1P (should fail in Local policy but not cluster)
N2 -> N1 nodeport -> N1P
N2 -> LB IP -> N1P
N1 -> SVC Cluster IP -> N1P

EXT -> LB IP -> N1P
EXT -> N1 nodeport -> N1P
EXT -> N2 nodeport -> N1P (should fail in Local policy but not cluster)

Everything behaves as expected (at least as I expect) with the exception of: dsr or snat + Local + N1 -> LB IP -> N1P

Nice work! Looking forward to the final Local piece!

This makes me so happy! I’m tied up most of the day today but I’ll definitely have some feedback in the next day or so. This is super high priority for me so I really appreciate the help.