kubernetes: removal of service-proxy label on Endpoints is not reconciled

What happened: When the service.kubernetes.io/service-proxy-name label is removed from an Endpoints object, kube-proxy doesn’t reconcile the service. Therefore, the Service stays unavailable.

What you expected to happen: kube-proxy reconciles the Endpoints, removes the “no endpoints” iptable rule, and installs correct rules to enable accessbility.

How to reproduce it (as minimally and precisely as possible):

  1. Check if we have a clean test environment
➜ ✗ kubectl get service
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   100.64.0.1   <none>        443/TCP   34h
➜  ✗ kubectl get ep
NAME         ENDPOINTS            AGE
kubernetes   10.92.124.198:6443   34h
➜  ✗ kubectl get pod
No resources found in default namespace.
  1. creates a deployment
➜  ✗ cat << EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-proxy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: test-proxy
  template:
    metadata:
      labels:
        app: test-proxy
    spec:
      serviceAccountName: default
      containers:
        - name: nginx
          image: xxx/nginx:latest

EOF
deployment.apps/test-proxy created
  1. Find the Pod information
➜ ✗ kubectl get pod -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE                       NOMINATED NODE   READINESS GATES
test-proxy-67db64d8cc-pg2qn   1/1     Running   0          2m45s   100.96.1.12   w1-md-0-86c6b7b994-r7jrz   <none>           <none>
test-proxy-67db64d8cc-r72x8   1/1     Running   0          2m45s   100.96.1.13   w1-md-0-86c6b7b994-r7jrz   <none>           <none>

➜ ✗ kubectl get pod test-proxy-67db64d8cc-pg2qn -o json | jq -cr '.metadata.resourceVersion'
500728
➜ ✗ kubectl get pod test-proxy-67db64d8cc-pg2qn -o json | jq -cr '.metadata.uid'
fe3fefaf-0f04-4b49-904a-20566af788ec

➜  ✗ kubectl get pod test-proxy-67db64d8cc-r72x8 -o json | jq -cr '.metadata.resourceVersion'
500719
➜  ✗ kubectl get pod test-proxy-67db64d8cc-r72x8 -o json | jq -cr '.metadata.uid'
48a21c5c-5144-4726-ba30-5f55977c7860
  1. Manually put together an Endpoints and create it
➜ ✗ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
  name: test-proxy
  namespace: default
subsets:
  - addresses:
      - ip: 100.96.1.12
        nodeName: w1-md-0-86c6b7b994-r7jrz
        targetRef:
          kind: Pod
          name: test-proxy-67db64d8cc-pg2qn
          namespace: default
          resourceVersion: "500728"
          uid: fe3fefaf-0f04-4b49-904a-20566af788ec
      - ip: 100.96.1.13
        nodeName: w1-md-0-86c6b7b994-r7jrz
        targetRef:
          kind: Pod
          name: test-proxy-67db64d8cc-pqrnp
          namespace: default
          resourceVersion: "500719"
          uid: 48a21c5c-5144-4726-ba30-5f55977c7860
    ports:
      - port: 80
        protocol: TCP
EOF
endpoints/test-proxy created
  1. Create a LB type SVC for the Endpoints
➜  ✗ cat << EOF  | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: test-proxy
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer
EOF
service/test-proxy created
  1. Verify it’s accessible
➜  ✗ kubectl get service
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
kubernetes   ClusterIP      100.64.0.1      <none>         443/TCP        35h
test-proxy   LoadBalancer   100.69.222.79   10.92.96.254   80:31280/TCP   43s
➜  ✗ curl 10.92.96.254
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
➜  ✗ curl 10.92.96.254
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
  1. delete the Endpoints and manually create a new one with label “service.kubernetes.io/service-proxy-name” applied:
➜  ✗ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
  name: test-proxy
  namespace: default
  labels:
    service.kubernetes.io/service-proxy-name: others
subsets:
  - addresses:
      - ip: 100.96.1.12
        nodeName: w1-md-0-86c6b7b994-r7jrz
        targetRef:
          kind: Pod
          name: test-proxy-67db64d8cc-pg2qn
          namespace: default
          resourceVersion: "500728"
          uid: fe3fefaf-0f04-4b49-904a-20566af788ec
      - ip: 100.96.1.13
        nodeName: w1-md-0-86c6b7b994-r7jrz
        targetRef:
          kind: Pod
          name: test-proxy-67db64d8cc-pqrnp
          namespace: default
          resourceVersion: "500719"
          uid: 48a21c5c-5144-4726-ba30-5f55977c7860
    ports:
      - port: 80
        protocol: TCP
EOF
endpoints/test-proxy created
  1. Verify the Service is inaccessible
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
  1. Edit the Endpoints to remove the label
➜ ✗ kubectl edit ep test-proxy -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{},"labels":{"service.kubernetes.io/service-proxy-name":"others"},"name":"test-proxy","namespace":"default"},"subsets":[{"addresses":[{"ip":"100.96.1.12","nodeName":"w1-md-0-86c6b7b994-r7jrz","targetRef":{"kind":"Pod","name":"test-proxy-67db64d8cc-pg2qn","namespace":"default","resourceVersion":"500728","uid":"fe3fefaf-0f04-4b49-904a-20566af788ec"}},{"ip":"100.96.1.13","nodeName":"w1-md-0-86c6b7b994-r7jrz","targetRef":{"kind":"Pod","name":"test-proxy-67db64d8cc-pqrnp","namespace":"default","resourceVersion":"500719","uid":"48a21c5c-5144-4726-ba30-5f55977c7860"}}],"ports":[{"port":80,"protocol":"TCP"}]}]}
  creationTimestamp: "2021-01-29T18:28:40Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:subsets: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-01-29T18:28:40Z"
  name: test-proxy
  namespace: default
  resourceVersion: "502865"
  selfLink: /api/v1/namespaces/default/endpoints/test-proxy
  uid: 2a600292-97d6-43d9-92b7-2f2e280c1ab2
subsets:
- addresses:
  - ip: 100.96.1.12
    nodeName: w1-md-0-86c6b7b994-r7jrz
    targetRef:
      kind: Pod
      name: test-proxy-67db64d8cc-pg2qn
      namespace: default
      resourceVersion: "500728"
      uid: fe3fefaf-0f04-4b49-904a-20566af788ec
  - ip: 100.96.1.13
    nodeName: w1-md-0-86c6b7b994-r7jrz
    targetRef:
      kind: Pod
      name: test-proxy-67db64d8cc-pqrnp
      namespace: default
      resourceVersion: "500719"
      uid: 48a21c5c-5144-4726-ba30-5f55977c7860
  ports:
  - port: 80
    protocol: TCP
  1. Describe Service, Endpoints, and query the Service, which is still inaccessible
➜ ✗ kubectl describe ep test-proxy
Name:         test-proxy
Namespace:    default
Labels:       <none>
Annotations:  <none>
Subsets:
  Addresses:          100.96.1.12,100.96.1.13
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  80    TCP

Events:  <none>
➜ ✗ kubectl describe service test-proxy
Name:                     test-proxy
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 <none>
Type:                     LoadBalancer
IP:                       100.69.222.79
LoadBalancer Ingress:     10.92.96.254
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31280/TCP
Endpoints:                100.96.1.12:80,100.96.1.13:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
  1. check the iptables rules on the node
# iptables-save | grep 10.92.96.254
-A KUBE-SERVICES -d 10.92.96.254/32 -p tcp -m comment --comment "default/test-proxy has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
  1. Manually kill kube-proxy pod to restart
➜ ✗ kubectl get pods -n kube-system
NAME                                             READY   STATUS    RESTARTS   AGE
...
kube-proxy-hr2fj                                 1/1     Running   0          35h
kube-proxy-jw7st                                 1/1     Running   0          35h
...
➜ ✗ kubectl delete pod -n kube-system kube-proxy-hr2fj
pod "kube-proxy-hr2fj" deleted
➜  ✗ kubectl delete pod -n kube-system kube-proxy-jw7st
pod "kube-proxy-jw7st" deleted
➜ ✗ kubectl get pods -n kube-system
...
kube-proxy-6bbs8                                 1/1     Running   0          25s
kube-proxy-qtspc                                 1/1     Running   0          6s
...
  1. Service is still inaccessible and iptable rule stays unchanged
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
➜ ✗ curl --max-time 1 10.92.96.254
curl: (7) Failed to connect to 10.92.96.254 port 80: Connection refused
...
# iptables-save | grep 10.92.96.254
-A KUBE-SERVICES -d 10.92.96.254/32 -p tcp -m comment --comment "default/test-proxy has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
  1. Delete the Endpoint and recreate it without the label
➜  ✗ kubectl delete ep test-proxy
endpoints "test-proxy" deleted
➜ ✗ cat << EOF | kubectl apply -f -
pipe heredoc> apiVersion: v1
kind: Endpoints
metadata:
  name: test-proxy
  namespace: default
subsets:
  - addresses:
      - ip: 100.96.1.12
        nodeName: w1-md-0-86c6b7b994-r7jrz
        targetRef:
          kind: Pod
          name: test-proxy-67db64d8cc-pg2qn
          namespace: default
          resourceVersion: "500728"
          uid: fe3fefaf-0f04-4b49-904a-20566af788ec
      - ip: 100.96.1.13
        nodeName: w1-md-0-86c6b7b994-r7jrz
        targetRef:
          kind: Pod
          name: test-proxy-67db64d8cc-pqrnp
          namespace: default
          resourceVersion: "500719"
          uid: 48a21c5c-5144-4726-ba30-5f55977c7860
    ports:
      - port: 80
        protocol: TCP
pipe heredoc> EOF
endpoints/test-proxy created
  1. Service is back to normal
➜ ✗ curl --max-time 1 10.92.96.254
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Anything else we need to know?: no

Environment:

  • Kubernetes version (use kubectl version):
➜ ✗ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3+vmware.1", GitCommit:"2ac9e7ea06a1230ca196931def19d2bb67b580c7", GitTreeState:"clean", BuildDate:"2020-10-30T07:25:38Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
# cat /etc/os-release
NAME="VMware Photon OS"
VERSION="3.0"
ID=photon
VERSION_ID=3.0
PRETTY_NAME="VMware Photon OS/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://vmware.github.io/photon/"
BUG_REPORT_URL="https://github.com/vmware/photon/issues"
  • Kernel (e.g. uname -a):
# uname -a
Linux w1-md-0-86c6b7b994-r7jrz 4.19.150-1.ph3 #1-photon SMP Fri Oct 23 02:29:37 UTC 2020 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (17 by maintainers)

Most upvoted comments

I’ll attempt to reproduce this today , especially since your using tanzu 😃 (we noticed the photon in your bug report) /assign

Found out the label is still on the EndpointSlice. In fact, I don’t think updating Endpoint causes any update on the EndpointSlice

We fixed that only 10 days ago #98116 , @robscott should we backport?

Things to take into account:

  1. the kube-proxy informer filters everything with the service-proxy-name label, that means endpoints and services

https://github.com/kubernetes/kubernetes/blob/6dc0047396d3aad16928d346da8a5052d8a824fa/cmd/kube-proxy/app/server.go#L726-L730

  1. the endpoints controller does´t process updates on Endpoints, and is not likely that is going to change

https://github.com/kubernetes/kubernetes/issues/98066#issuecomment-763920732 https://github.com/kubernetes/kubernetes/pull/98122#issuecomment-766353557

  1. the endpoints controller autogenerate endpoints for services with a selector, that means that when you create or delete (not update as explained in 2. ) custom endpoints you are going to race with it

The key is the step 9.,

Edit the Endpoints to remove the label

That operation, edit to remove the label IIRC, is going to be a PATCH to the Endpoint object, that patch seems to not be processed by kube-proxy, and my apimachinery fu is not strong enough to know if it should be processed or not per the kube-proxy config in 1.

I wish I have more time to hack on this, this will be easily reproduced with KIND, you don´t need a loadbalancer service, a clusterip or nodeport one will be enough.

Just enable enough logging in kube-proxy and the controller-manager so you can correlate your actions with what the endpoints controller and the kube-proxy are doing, if the PATCH is not processed by kube-proxy and that is how the informer is supposed to work, this is working as expected.

With endpointslices this will not going to happen, RobScott explaines it well in one of the comments in 2., slices controller have control of the generated slices