kubernetes: Pod service endpoint unreachable from same host

What happened: Establishing TCP/UDP traffic to a ClusterIP fails when connection is load balanced via iptables to a pod on the same host.

What you expected to happen: conntrack shows that the udp datagram is DNATted to 10.200.1.37 from 10.200.1.36 (the host has podCIDR: "10.200.1.0/24").

    [NEW] udp      17 30 src=10.200.1.36 dst=10.32.0.10 sport=45956 dport=53 [UNREPLIED] src=10.200.1.37 dst=10.200.1.36 sport=53 dport=45956
[DESTROY] udp      17 src=10.200.1.36 dst=10.32.0.10 sport=57957 dport=53 [UNREPLIED] src=10.200.1.37 dst=10.200.1.36 sport=53 dport=57957

From my understanding, because pods have /24 mask, the reply from .37 doesn’t get through cnio0 but directly to .36 braking the DNAT. This is the tcpdump that shows that:

15:10:27.464509 IP 10.200.1.36.42897 > 10.32.0.10.53: 16896+ A?
pippo.it. (26)
15:10:27.464587 IP 10.200.1.36.42897 > 10.32.0.10.53: 16896+ AAAA? pippo.it. (26)
15:10:27.464777 IP 10.200.1.37.53 > 10.200.1.36.42897: 16896 ServFail- 0/0/0 (26)
15:10:27.464841 IP 10.200.1.37.53 > 10.200.1.36.42897: 16896 ServFail- 0/0/0 (26)

How to reproduce it (as minimally and precisely as possible):

  • create ClusterIP with a single pod endpoint
  • from a pod on the the same host just open a TCP connection or send and UDP datagram, communication will fail.

Anything else we need to know?: I’m not able to assign /32 subnet to pods, both:

    "ipam": {
        "type": "host-local",
        "ranges": [
          [{"subnet": "10.200.1.0/32"}]
        ]
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
...
podCIDR: "10.200.1.0/32"

don’t work. I ended up changing manually the subnet to see if my hypotesis was right. This fixed the problem because forced the packets to flow back to cnio0 to the DNAT tracked by conntrack:

ip netns exec cni-a6afeeee-a34b-8e24-de62-26ffa93a4bd8 ip a add 10.200.1.37/32 dev eth0
ip netns exec cni-a6afeeee-a34b-8e24-de62-26ffa93a4bd8 ip a del 10.200.1.37/24 dev eth0

Am I doing something wrong? Why is not possible to assign a /32 subnet to pods? Is there a cleaner solution?

Even if the conditions are different, the problem could be similar to https://github.com/kubernetes/kubernetes/issues/87263

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2",
GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2",
GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: Virtualbox VMs
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.3 LTS
  • Kernel (e.g. uname -a): Linux worker-0 4.15.0-74-generic # 84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: Manual installation following https://github.com/kelseyhightower/kubernetes-the-hard-way
  • Network plugin and version (if this is a network-related bug): L2 networks and linux bridging
  • Others: CNI conf:
{
    "cniVersion": "0.3.1",
    "name": "bridge",
    "type": "bridge",
    "bridge": "cnio0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [{"subnet": "10.200.1.0/24"}]
        ],
        "routes": [{"dst": "0.0.0.0/0"}]
    }
}

iptables-save.txt

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 27 (19 by maintainers)

Most upvoted comments

@thockin Bullseye! Pretty good for shooting in the dark 😃

I removed “masquerade-all” and added;

modprobe br-netfilter
sysctl -w net.bridge.bridge-nf-call-iptables=1

on node start-up and both TCP to a local POD and DNS queries to a local server works perfectly.

I will raise an issue on https://github.com/kelseyhightower/kubernetes-the-hard-way refering to this issue.

@eraclitux Please try the sysctl above and forget all about “masquerade-all”. If it works, please close this issue.