kubernetes: Service fails to estabilish connection (timeouts) after endpoint pod is recreated by deployment.

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

After replacement by deployment of a pod which is the only endpoint for a service, the service stopped serving traffic to and from the endpoint. The service was used by one other pod. The issue resolved itself after 2 hours or trying to use the service from some other pods then the one that had the original connection problems (?).

What you expected to happen:

Service still serving from the endpoint, as pod is replaced by deployment.

How to reproduce it (as minimally and precisely as possible): I don’t know, we have problems with this one particular service and pod, but everything else seems to be working in order, including virtually identical configurations for other tiers of our services. It’s not the first time the service cannot reach the endpoint pod after endpoint pod restart.

Anything else we need to know?: The service dns name is resolvable, the service is not usable by it’s IP address, but the endpoint pod is working fine when using it’s IP directly.

~ $ kubectl exec -ti (kubectl get pod -l "name=canary-samus-varnish" -o name | sed -E 's%pods?/%%' | tail -n 1) -- bash
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# host canary-kabuto                                                                                                                                                  
canary-kabuto.default.svc.cluster.local has address 10.0.121.77
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# curl -sL http://canary-kabuto/ 
^C (timeout)
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# curl -sL http://10.0.121.77/  
^C (timeout)
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# curl -sL http://10.240.0.66/
<!doctype html><html><body>Hello world!</body></html>

Service metadata from kubectl:

~ $ kubectl get service canary-kabuto
NAME            TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
canary-kabuto   ClusterIP   10.0.121.77   <none>        80/TCP    5d

~ $ kubectl get endpoints canary-kabuto
NAME            ENDPOINTS        AGE
canary-kabuto   10.240.0.66:80   5d

Both pods (the pod behind service, and one that’s trying to use that service) are by chance on the same node:

~ $ kubectl describe pod canary-samus-varnish-7cb56f97c6-5cnzp | grep Node
Node:           k8s-agent-10531284-1/10.240.0.65
Node-Selectors:  <none>

~ $ kubectl describe pod canary-kabuto-6bc965c4fb-v2l8t | grep Node
Node:           k8s-agent-10531284-1/10.240.0.65
Node-Selectors:  <none>

Node’s iptables pertaining to the service look fine to me:

root@k8s-agent-10531284-1:~# iptables-save | grep canary-kabuto
-A KUBE-SEP-CG6CX7OLHU57F3EX -s 10.240.0.66/32 -m comment --comment "default/canary-kabuto:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-CG6CX7OLHU57F3EX -p tcp -m comment --comment "default/canary-kabuto:http" -m tcp -j DNAT --to-destination 10.240.0.66:80
-A KUBE-SERVICES ! -s 10.240.0.0/12 -d 10.0.121.77/32 -p tcp -m comment --comment "default/canary-kabuto:http cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.121.77/32 -p tcp -m comment --comment "default/canary-kabuto:http cluster IP" -m tcp --dport 80 -j KUBE-SVC-MMLUSW4QS4ZMIPKO
-A KUBE-SVC-MMLUSW4QS4ZMIPKO -m comment --comment "default/canary-kabuto:http" -j KUBE-SEP-CG6CX7OLHU57F3EX

Issue resolved itself as I was trying to check for the sake of this issue report, if the same service behaviour can be observed from other pods/nodes. Not only service is behaving as expected on other nodes and pods, after using it from other pods it started working on the originally affected pod. I’m not sure if there is causation there, as I was not testing for a few minutes prior. That happened roughly 2 hours from the onset of the problem.

Outputs above were gathered as issue was ongoing, and I did not try to actively resolve the issue by changing configuration, restarting pods or services etc.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: Azure, DSv2 VMs
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.3 LTS
  • Kernel (e.g. uname -a): Linux k8s-agent-10531284-1 4.13.0-1007-azure #9-Ubuntu SMP Thu Jan 25 10:47:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: ACS Engine, run from local machine, to have k8s 1.8.4 and managed disks
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (4 by maintainers)

Most upvoted comments

@OJezu There are two issues that can lead to this.

  1. Endpoint controller is not running (is your controller running as expected?) – I am unable to tell if pod ips are correct in kubectl get endpoint <<service-name>> – Can you please confirm?
  2. Hung kube-proxy (it will look normal). That leads to un-updated iptables rules to forward traffic from service ip to pods as expected. You will need to check iptables on nodes where the connection is originating (not on the node that has the pods).

Which kubernetes version are you running?