kubernetes: Pods that fail health checks always restarting on the same minion instead of others?

Over the weekend the skydns container in the kube-dns pod died. The exact reason I’m not sure because I couldn’t find much detail from the logs, but watching the etcd and skydns logs showed the root issue could’ve been etcd. A theory I have is that the /mnt/emphemeral/kubernetes filesystem was full (it’s only 3.75GB and has a few large empty-dir volumes). It was showing 3/4 ready for kube-dns.

This caused all of my application pods across 4 minions to go down. I had to manually delete the kube-dns pod and when it launched on another minion it was fine and everything came back online.

On the same token. I had 1 minion that would never consider any of my pods “ready”, even though the other the other 3 minions did. I didn’t find out why and my logs weren’t helpful, so I just had to manually terminate that minion (EC2 instance) and auto-scale a new one (which happened to work fine).

For both of these cases, if k8s automatically moved the pods that constantly failed to other minions I think the cluster would’ve healed itself. Is the fact that failing pods always try to restart on the same minion intentional or something in the works?

I’m sorry I don’t have logs to show. I’m not sure how to retrieve them from 2 days ago after so many pods have been restarted.

About this issue

Original URL
State: closed
Created 9 years ago
Comments: 16 (10 by maintainers)

Most upvoted comments

Hi all, additionally to this, there can be other problems, like in virtualized environment the reported sizes that we can detect might not be those we can work with and we might infact be running on swap and therefore services might not react in time and though should be moved to other hosts. I suggest a possibility to define an amount of healthcheck related restarts until the pod should be rescheduled.

GreatSUN on Aug 8, 2016

@lavalamp certainly, it’s critical to be able to detect issues with nodes. Is this something on that roadmap that will be built into kubernetes/kubelets? In the meantime, I need someway to detect this internally and either automatically handle it and/or send alerts. What are some ways you’d advise to do this?

This issue bit me again over the weekend. I have a simple 3-node cluster in AWS that was provisioned with cluster/kube-up with only 3 non kube-system pods. Everything was healthy on Friday, and without any changes over the weekend, I checked it again a few days later and all pods on that particular node were failing [1], and had they restarted on another node everything would’ve been fine.

[1] is this is another Github issue I should create?

kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                                 READY     STATUS                                                                   RESTARTS   AGE       NODE
default       xxx-xxxxxxxx-prd-06c055d7-7neib                      1/1       Running                                                                  0          52m       ip-172-20-0-141.ec2.internal
default       xxx-xxxxxxxx-prd-06c055d7-sxkyo                      0/1       Image: xxxxxxxx/xxx-xxxxxxxx:prd-06c055d7 is not ready on the node       0          30m       ip-172-20-0-84.ec2.internal
kube-system   elasticsearch-logging-v1-61a90                       1/1       Running                                                                  0          5d        ip-172-20-0-84.ec2.internal
kube-system   fluentd-elasticsearch-ip-172-20-0-141.ec2.internal   1/1       Running                                                                  2          5d        ip-172-20-0-141.ec2.internal
kube-system   fluentd-elasticsearch-ip-172-20-0-183.ec2.internal   1/1       Running                                                                  0          5d        ip-172-20-0-183.ec2.internal
kube-system   fluentd-elasticsearch-ip-172-20-0-84.ec2.internal    1/1       Running                                                                  0          5d        ip-172-20-0-84.ec2.internal
kube-system   kibana-logging-v1-mldpo                              1/1       Running                                                                  0          5d        ip-172-20-0-84.ec2.internal
kube-system   kube-dns-v8-3zel5                                    4/4       Running                                                                  1          4d        ip-172-20-0-141.ec2.internal
kube-system   kube-dns-v8-ud0oq                                    1/4       API error (500): Cannot start container 060b46a4a91716cecc1e4cbe60a66450d33a8f7947db95577c9104cc849d744b: [8] System error: too many open files in system
              33                                                   5d        ip-172-20-0-84.ec2.internal
kube-system   kube-ui-v1-yq9an                                     1/1       Running   0         4d        ip-172-20-0-183.ec2.internal
kube-system   monitoring-heapster-v6-ckogk                         0/1       API error (500): Cannot start container 0e7daac3182af32b2867072b72b5d324b7e4177d1136df9d389fb671e8f280bf: [8] System error: too many open files in system
              11                                                   5d        ip-172-20-0-84.ec2.internal
kube-system   monitoring-influx-grafana-v1-4ubv9                   2/2       Running   2         4d        ip-172-20-0-84.ec2.internal

kubectl get events => Error syncing pod, skipping: API error (500): Cannot start container 8b987eaa17cade98a4ba702381d88b52e7344e03af2f6f9f157ad4049ef35c2f: [8] System error: too many open files in system

joshm1 on Sep 8, 2015