kubernetes: Service fails to estabilish connection (timeouts) after endpoint pod is recreated by deployment.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
After replacement by deployment of a pod which is the only endpoint for a service, the service stopped serving traffic to and from the endpoint. The service was used by one other pod. The issue resolved itself after 2 hours or trying to use the service from some other pods then the one that had the original connection problems (?).
What you expected to happen:
Service still serving from the endpoint, as pod is replaced by deployment.
How to reproduce it (as minimally and precisely as possible): I don’t know, we have problems with this one particular service and pod, but everything else seems to be working in order, including virtually identical configurations for other tiers of our services. It’s not the first time the service cannot reach the endpoint pod after endpoint pod restart.
Anything else we need to know?: The service dns name is resolvable, the service is not usable by it’s IP address, but the endpoint pod is working fine when using it’s IP directly.
~ $ kubectl exec -ti (kubectl get pod -l "name=canary-samus-varnish" -o name | sed -E 's%pods?/%%' | tail -n 1) -- bash
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# host canary-kabuto
canary-kabuto.default.svc.cluster.local has address 10.0.121.77
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# curl -sL http://canary-kabuto/
^C (timeout)
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# curl -sL http://10.0.121.77/
^C (timeout)
root@canary-samus-varnish-7cb56f97c6-5cnzp:/# curl -sL http://10.240.0.66/
<!doctype html><html><body>Hello world!</body></html>
Service metadata from kubectl:
~ $ kubectl get service canary-kabuto
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
canary-kabuto ClusterIP 10.0.121.77 <none> 80/TCP 5d
~ $ kubectl get endpoints canary-kabuto
NAME ENDPOINTS AGE
canary-kabuto 10.240.0.66:80 5d
Both pods (the pod behind service, and one that’s trying to use that service) are by chance on the same node:
~ $ kubectl describe pod canary-samus-varnish-7cb56f97c6-5cnzp | grep Node
Node: k8s-agent-10531284-1/10.240.0.65
Node-Selectors: <none>
~ $ kubectl describe pod canary-kabuto-6bc965c4fb-v2l8t | grep Node
Node: k8s-agent-10531284-1/10.240.0.65
Node-Selectors: <none>
Node’s iptables pertaining to the service look fine to me:
root@k8s-agent-10531284-1:~# iptables-save | grep canary-kabuto
-A KUBE-SEP-CG6CX7OLHU57F3EX -s 10.240.0.66/32 -m comment --comment "default/canary-kabuto:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-CG6CX7OLHU57F3EX -p tcp -m comment --comment "default/canary-kabuto:http" -m tcp -j DNAT --to-destination 10.240.0.66:80
-A KUBE-SERVICES ! -s 10.240.0.0/12 -d 10.0.121.77/32 -p tcp -m comment --comment "default/canary-kabuto:http cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.121.77/32 -p tcp -m comment --comment "default/canary-kabuto:http cluster IP" -m tcp --dport 80 -j KUBE-SVC-MMLUSW4QS4ZMIPKO
-A KUBE-SVC-MMLUSW4QS4ZMIPKO -m comment --comment "default/canary-kabuto:http" -j KUBE-SEP-CG6CX7OLHU57F3EX
Issue resolved itself as I was trying to check for the sake of this issue report, if the same service behaviour can be observed from other pods/nodes. Not only service is behaving as expected on other nodes and pods, after using it from other pods it started working on the originally affected pod. I’m not sure if there is causation there, as I was not testing for a few minutes prior. That happened roughly 2 hours from the onset of the problem.
Outputs above were gathered as issue was ongoing, and I did not try to actively resolve the issue by changing configuration, restarting pods or services etc.
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Azure, DSv2 VMs
- OS (e.g. from /etc/os-release): Ubuntu 16.04.3 LTS
- Kernel (e.g.
uname -a
): Linux k8s-agent-10531284-1 4.13.0-1007-azure #9-Ubuntu SMP Thu Jan 25 10:47:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - Install tools: ACS Engine, run from local machine, to have k8s 1.8.4 and managed disks
- Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (4 by maintainers)
@OJezu There are two issues that can lead to this.
kubectl get endpoint <<service-name>>
– Can you please confirm?Which kubernetes version are you running?