kubernetes: kube-proxy IP-tables blocks connections to ExternalTrafficPolicy:Cluster
What happened:
-
There is a 5-node k8s cluster with two masters. Masters are tainted to run workloads.
-
We deploy kong as Kubernetes Service of type NodePort (config here). External applications do TCP connections (HTTP 1.0, no keep-alive) to perform health-checks every 15 seconds. These health-checks succeed and fail randomly because of TCP connection issues. The 3-way handshake never finishes sometimes. See more details below.
-
This is not a problem with other proxies, but only happens when Kong is running. One would suspect something is wrong with Kong but we’ve stripped down the health-check endpoint to a simple Nginx instance returning a 200 on location
/health
.
What you expected to happen:
TCP connections should succeed.
How to reproduce it (as minimally and precisely as possible):
This is extremely hard to reproduce.
Anything else we need to know?:
- In this setup, the above issue happens only on the specific node on which the pod is running. Connections via other k8s worker/master nodes to the pod succeeds always.
- On the worker node on which the pod runs, the connection is successful when hitting the Docker container directly but it doesn’t work if the connection is made to IP of the worker node. That is to say, as soon as the IP tables kicks in, things go wrong.
- The connections do succeed sporadically but it is totally random. There are no errors in kernel logs.
- tcpdump and conntrack tables show that the SYN arrives at the host network but then there is a time out.
ExternalTrafficPolicy: Local
works fine and has no issues at all in this setup.
$ sysctl -p
net.core.somaxconn = 50000
net.ipv4.tcp_max_syn_backlog = 50000
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 2500
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:27:17Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:27:17Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Bare-metal cluster
- OS (e.g:
cat /etc/os-release
):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel (e.g.
uname -a
):
Linux nsd-on-hood-k8s-master-01 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Network plugin and version (if this is a network-related bug): Calico and flannel, both have the same result
- Others:
We’ve been trying to debug this for weeks but have not been able to make much progress. Any clue as to what can be going wrong here? We’re happy to run some other tests in the cluster or provide more details as necessary.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (13 by maintainers)
I’m sorry about all the confusion in this thread. The
externalTrafficPolicy
is named in such a way that it is almost impossible to figure out which one does what. And cloud provider implementations (except Google) not respecting that makes the whole thing more difficult.Here is the thing to clear out the confusion between the traffic policy:
Local
to get around the bug in this issue.The bug being if we send a request to the worker node that is actually running the pod, it doesn’t work. The first TCP connection succeeds (no keepalives). But subsequent connections fail for sometime. One connection would succeed sporadically. If the source IP is different, the same behavior persists.
Correct. We used conntrack for visibility and observations only.
No
ping @caseydavenport
@aojea Already tested that but no luck.