kubernetes: removal of service-proxy label on Endpoints is not reconciled
What happened: When the service.kubernetes.io/service-proxy-name label is removed from an Endpoints object, kube-proxy doesn’t reconcile the service. Therefore, the Service stays unavailable.
What you expected to happen: kube-proxy reconciles the Endpoints, removes the “no endpoints” iptable rule, and installs correct rules to enable accessbility.
How to reproduce it (as minimally and precisely as possible):
- Check if we have a clean test environment
➜ ✗ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 34h
➜ ✗ kubectl get ep
NAME ENDPOINTS AGE
kubernetes 10.92.124.198:6443 34h
➜ ✗ kubectl get pod
No resources found in default namespace.
- creates a deployment
➜ ✗ cat << EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-proxy
spec:
replicas: 2
selector:
matchLabels:
app: test-proxy
template:
metadata:
labels:
app: test-proxy
spec:
serviceAccountName: default
containers:
- name: nginx
image: xxx/nginx:latest
EOF
deployment.apps/test-proxy created
- Find the Pod information
➜ ✗ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-proxy-67db64d8cc-pg2qn 1/1 Running 0 2m45s 100.96.1.12 w1-md-0-86c6b7b994-r7jrz <none> <none>
test-proxy-67db64d8cc-r72x8 1/1 Running 0 2m45s 100.96.1.13 w1-md-0-86c6b7b994-r7jrz <none> <none>
➜ ✗ kubectl get pod test-proxy-67db64d8cc-pg2qn -o json | jq -cr '.metadata.resourceVersion'
500728
➜ ✗ kubectl get pod test-proxy-67db64d8cc-pg2qn -o json | jq -cr '.metadata.uid'
fe3fefaf-0f04-4b49-904a-20566af788ec
➜ ✗ kubectl get pod test-proxy-67db64d8cc-r72x8 -o json | jq -cr '.metadata.resourceVersion'
500719
➜ ✗ kubectl get pod test-proxy-67db64d8cc-r72x8 -o json | jq -cr '.metadata.uid'
48a21c5c-5144-4726-ba30-5f55977c7860
- Manually put together an Endpoints and create it
➜ ✗ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
name: test-proxy
namespace: default
subsets:
- addresses:
- ip: 100.96.1.12
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pg2qn
namespace: default
resourceVersion: "500728"
uid: fe3fefaf-0f04-4b49-904a-20566af788ec
- ip: 100.96.1.13
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pqrnp
namespace: default
resourceVersion: "500719"
uid: 48a21c5c-5144-4726-ba30-5f55977c7860
ports:
- port: 80
protocol: TCP
EOF
endpoints/test-proxy created
- Create a LB type SVC for the Endpoints
➜ ✗ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: test-proxy
spec:
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
EOF
service/test-proxy created
- Verify it’s accessible
➜ ✗ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 35h
test-proxy LoadBalancer 100.69.222.79 10.92.96.254 80:31280/TCP 43s
➜ ✗ curl 10.92.96.254
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
➜ ✗ curl 10.92.96.254
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
- delete the Endpoints and manually create a new one with label “service.kubernetes.io/service-proxy-name” applied:
➜ ✗ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
name: test-proxy
namespace: default
labels:
service.kubernetes.io/service-proxy-name: others
subsets:
- addresses:
- ip: 100.96.1.12
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pg2qn
namespace: default
resourceVersion: "500728"
uid: fe3fefaf-0f04-4b49-904a-20566af788ec
- ip: 100.96.1.13
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pqrnp
namespace: default
resourceVersion: "500719"
uid: 48a21c5c-5144-4726-ba30-5f55977c7860
ports:
- port: 80
protocol: TCP
EOF
endpoints/test-proxy created
- Verify the Service is inaccessible
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
- Edit the Endpoints to remove the label
➜ ✗ kubectl edit ep test-proxy -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{},"labels":{"service.kubernetes.io/service-proxy-name":"others"},"name":"test-proxy","namespace":"default"},"subsets":[{"addresses":[{"ip":"100.96.1.12","nodeName":"w1-md-0-86c6b7b994-r7jrz","targetRef":{"kind":"Pod","name":"test-proxy-67db64d8cc-pg2qn","namespace":"default","resourceVersion":"500728","uid":"fe3fefaf-0f04-4b49-904a-20566af788ec"}},{"ip":"100.96.1.13","nodeName":"w1-md-0-86c6b7b994-r7jrz","targetRef":{"kind":"Pod","name":"test-proxy-67db64d8cc-pqrnp","namespace":"default","resourceVersion":"500719","uid":"48a21c5c-5144-4726-ba30-5f55977c7860"}}],"ports":[{"port":80,"protocol":"TCP"}]}]}
creationTimestamp: "2021-01-29T18:28:40Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:subsets: {}
manager: kubectl-client-side-apply
operation: Update
time: "2021-01-29T18:28:40Z"
name: test-proxy
namespace: default
resourceVersion: "502865"
selfLink: /api/v1/namespaces/default/endpoints/test-proxy
uid: 2a600292-97d6-43d9-92b7-2f2e280c1ab2
subsets:
- addresses:
- ip: 100.96.1.12
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pg2qn
namespace: default
resourceVersion: "500728"
uid: fe3fefaf-0f04-4b49-904a-20566af788ec
- ip: 100.96.1.13
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pqrnp
namespace: default
resourceVersion: "500719"
uid: 48a21c5c-5144-4726-ba30-5f55977c7860
ports:
- port: 80
protocol: TCP
- Describe Service, Endpoints, and query the Service, which is still inaccessible
➜ ✗ kubectl describe ep test-proxy
Name: test-proxy
Namespace: default
Labels: <none>
Annotations: <none>
Subsets:
Addresses: 100.96.1.12,100.96.1.13
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
<unset> 80 TCP
Events: <none>
➜ ✗ kubectl describe service test-proxy
Name: test-proxy
Namespace: default
Labels: <none>
Annotations: <none>
Selector: <none>
Type: LoadBalancer
IP: 100.69.222.79
LoadBalancer Ingress: 10.92.96.254
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 31280/TCP
Endpoints: 100.96.1.12:80,100.96.1.13:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
- check the iptables rules on the node
# iptables-save | grep 10.92.96.254
-A KUBE-SERVICES -d 10.92.96.254/32 -p tcp -m comment --comment "default/test-proxy has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
- Manually kill kube-proxy pod to restart
➜ ✗ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
...
kube-proxy-hr2fj 1/1 Running 0 35h
kube-proxy-jw7st 1/1 Running 0 35h
...
➜ ✗ kubectl delete pod -n kube-system kube-proxy-hr2fj
pod "kube-proxy-hr2fj" deleted
➜ ✗ kubectl delete pod -n kube-system kube-proxy-jw7st
pod "kube-proxy-jw7st" deleted
➜ ✗ kubectl get pods -n kube-system
...
kube-proxy-6bbs8 1/1 Running 0 25s
kube-proxy-qtspc 1/1 Running 0 6s
...
- Service is still inaccessible and iptable rule stays unchanged
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
...
# iptables-save | grep 10.92.96.254
-A KUBE-SERVICES -d 10.92.96.254/32 -p tcp -m comment --comment "default/test-proxy has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
- Delete the Endpoint and recreate it without the label
➜ ✗ kubectl delete ep test-proxy
endpoints "test-proxy" deleted
➜ ✗ cat << EOF | kubectl apply -f -
pipe heredoc> apiVersion: v1
kind: Endpoints
metadata:
name: test-proxy
namespace: default
subsets:
- addresses:
- ip: 100.96.1.12
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pg2qn
namespace: default
resourceVersion: "500728"
uid: fe3fefaf-0f04-4b49-904a-20566af788ec
- ip: 100.96.1.13
nodeName: w1-md-0-86c6b7b994-r7jrz
targetRef:
kind: Pod
name: test-proxy-67db64d8cc-pqrnp
namespace: default
resourceVersion: "500719"
uid: 48a21c5c-5144-4726-ba30-5f55977c7860
ports:
- port: 80
protocol: TCP
pipe heredoc> EOF
endpoints/test-proxy created
- Service is back to normal
➜ ✗ curl --max-time 1 10.92.96.254
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Anything else we need to know?: no
Environment:
- Kubernetes version (use
kubectl version
):
➜ ✗ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3+vmware.1", GitCommit:"2ac9e7ea06a1230ca196931def19d2bb67b580c7", GitTreeState:"clean", BuildDate:"2020-10-30T07:25:38Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
):
# cat /etc/os-release
NAME="VMware Photon OS"
VERSION="3.0"
ID=photon
VERSION_ID=3.0
PRETTY_NAME="VMware Photon OS/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://vmware.github.io/photon/"
BUG_REPORT_URL="https://github.com/vmware/photon/issues"
- Kernel (e.g.
uname -a
):
# uname -a
Linux w1-md-0-86c6b7b994-r7jrz 4.19.150-1.ph3 #1-photon SMP Fri Oct 23 02:29:37 UTC 2020 x86_64 GNU/Linux
- Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (17 by maintainers)
I’ll attempt to reproduce this today , especially since your using tanzu 😃 (we noticed the photon in your bug report) /assign
We fixed that only 10 days ago #98116 , @robscott should we backport?
Things to take into account:
service-proxy-name
label, that means endpoints and serviceshttps://github.com/kubernetes/kubernetes/blob/6dc0047396d3aad16928d346da8a5052d8a824fa/cmd/kube-proxy/app/server.go#L726-L730
https://github.com/kubernetes/kubernetes/issues/98066#issuecomment-763920732 https://github.com/kubernetes/kubernetes/pull/98122#issuecomment-766353557
The key is the step 9.,
That operation, edit to remove the label IIRC, is going to be a PATCH to the Endpoint object, that patch seems to not be processed by kube-proxy, and my apimachinery fu is not strong enough to know if it should be processed or not per the kube-proxy config in 1.
I wish I have more time to hack on this, this will be easily reproduced with KIND, you don´t need a loadbalancer service, a clusterip or nodeport one will be enough.
Just enable enough logging in kube-proxy and the controller-manager so you can correlate your actions with what the endpoints controller and the kube-proxy are doing, if the PATCH is not processed by kube-proxy and that is how the informer is supposed to work, this is working as expected.
With endpointslices this will not going to happen, RobScott explaines it well in one of the comments in 2., slices controller have control of the generated slices