kubernetes: kube-proxy IPVS break UDP NodePort Services by clearing active conntrack entries
What happened?
One of our clusters exported an UDP NodePort service for clients to use. The client is using the kcp protocol talking to the servers, the “connection” between client and server are periodically “disconnected”.
the kcp protocol is a reliable stateful protocol based on udp.
The disconnections happen when kube-proxy proxier calling ClearEntriesForPort.
What did you expect to happen?
Shouldn’t kube-proxy first determine if the service and listener changed, then do the cleanup?
How can we reproduce it (as minimally and precisely as possible)?
Setup an UDP nodeport service, every syncPeriod the corresponding udp conntracks will be flushed.
Anything else we need to know?
The cleanup logic was originally added in #59286.
I know the UDP is commonly known as stateless protocol, and kcp protocol is a special case. However, image below case:
Someone is running a DNS server behind the nodeport, and the DNS server responds slowly. At the moment DNS server is handling the request, kube-proxy proxier cleans up the conntrack table. If that happens, the response packet would be lost.
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.15-aliyun.1", GitCommit:"707e514954f0f3ba8ce36face7cf7058403057bc", GitTreeState:"clean", BuildDate:"2022-09-22T03:45:47Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
[root@node ~]# cat /etc/os-release
NAME="Alibaba Cloud Linux (Aliyun Linux)"
VERSION="2.1903 LTS (Hunting Beagle)"
ID="alinux"
ID_LIKE="rhel fedora centos anolis"
VERSION_ID="2.1903"
PRETTY_NAME="Alibaba Cloud Linux (Aliyun Linux) 2.1903 LTS (Hunting Beagle)"
ANSI_COLOR="0;31"
HOME_URL="https://www.aliyun.com/"
[root@node ~]# uname -a
Linux node 4.19.91-26.1.al7.x86_64 #1 SMP Tue Jul 26 17:52:28 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 15 (14 by maintainers)
closing this as https://github.com/kubernetes/kubernetes/pull/116171 fixed the problem.
@xh4n3 feel free to reopen if you face the issue in the latest version 🙂
/close
@uablrek I was not able to reproduce this by following the steps you mentioned on kind-cluster with the latest k/k source.
#116171 might have solved this.