prometheus-operator: kube-prometheus & kubernetes 1.5.2 - prometheus-k8s-0 node - docker stops responding
What did you do? hack/cluster-monitoring/deploy What did you expect to see? Stable cluster What did you see instead? Under which circumstances? docker stops responding on node. Need to do a docker restart. Node just went down again after 3 days… Environment AWS KOPS 1.5.1 Kubernetes 1.5.2
-
Kubernetes version information: — kubernetes/kops ‹master› » ku version Client Version: version.Info{Major:“1”, Minor:“5”, GitVersion:“v1.5.2”, GitCommit:“08e099554f3c31f6e6f07b448ab3ed78d0520507”, GitTreeState:“clean”, BuildDate:“2017-01-12T04:57:25Z”, GoVersion:“go1.7.4”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“5”, GitVersion:“v1.5.2”, GitCommit:“08e099554f3c31f6e6f07b448ab3ed78d0520507”, GitTreeState:“clean”, BuildDate:“2017-01-12T04:52:34Z”, GoVersion:“go1.7.4”, Compiler:“gc”, Platform:“linux/amd64”}
-
Kubernetes cluster kind:
kops 1.5.2
-
Manifests:
https://github.com/coreos/kube-prometheus.git : 333bd23434a8da6ed8bf4b6e57e72e71f75dbc40 I plan to test release 0.7.0 soon…
- Prometheus Operator Logs:
Node looks like this before a
/etc/init.d/docker restart
to fix:
--- kubernetes/kops ‹master› » ku get po --all-namespaces -o wide | grep ip-10-101-118-222.ec2.internal
athena-graphql athena-graphql-cmd-290124063-x2c19 1/1 Unknown 1 9d 100.96.33.50 ip-10-101-118-222.ec2.internal
deis deis-controller-2434209242-ztkcs 1/1 Unknown 3 9d 100.96.33.47 ip-10-101-118-222.ec2.internal
deis deis-logger-fluentd-19csf 1/1 NodeLost 1 9d 100.96.33.44 ip-10-101-118-222.ec2.internal
deis deis-logger-redis-304849759-9z5g4 1/1 Unknown 1 5d 100.96.33.42 ip-10-101-118-222.ec2.internal
deis deis-monitor-telegraf-xf6mm 1/1 NodeLost 1 9d 100.96.33.36 ip-10-101-118-222.ec2.internal
deis deis-router-3101872284-nmwgf 1/1 Unknown 1 9d 100.96.33.43 ip-10-101-118-222.ec2.internal
deis deis-workflow-manager-2528409207-7pttp 1/1 Unknown 1 5d 100.96.33.34 ip-10-101-118-222.ec2.internal
hades-graphql hades-graphql-cmd-459006866-r3pbl 1/1 Unknown 1 3d 100.96.33.48 ip-10-101-118-222.ec2.internal
kube-system kube-proxy-ip-10-101-118-222.ec2.internal 1/1 Unknown 1 9d 10.101.118.222 ip-10-101-118-222.ec2.internal
monitoring grafana-1046448512-l8cgh 2/2 Unknown 2 9d 100.96.33.40 ip-10-101-118-222.ec2.internal
monitoring kube-state-metrics-4090613309-mnbrj 1/1 Unknown 1 9d 100.96.33.32 ip-10-101-118-222.ec2.internal
monitoring node-exporter-sz8r4 1/1 NodeLost 1 9d 10.101.118.222 ip-10-101-118-222.ec2.internal
monitoring prometheus-k8s-0 2/2 Unknown 2 5d 100.96.33.51 ip-10-101-118-222.ec2.internal
monitoring prometheus-operator-3658205960-2zpfp 1/1 Unknown 1 9d 100.96.33.49 ip-10-101-118-222.ec2.internal
programs-service programs-service-cmd-1240201140-mjd53 1/1 Unknown 0 2d 100.96.33.52 ip-10-101-118-222.ec2.internal
speech-to-text-nodejs speech-to-text-nodejs-cmd-2508035524-zk217 1/1 Unknown 1 9d 100.96.33.45 ip-10-101-118-222.ec2.internal
splunkspout k8ssplunkspout-nonprod-c0d1w 1/1 NodeLost 1 9d 100.96.33.35 ip-10-101-118-222.ec2.internal
styleguide styleguide-cmd-3772371803-2tdlw 1/1 Unknown 1 9d 100.96.33.38 ip-10-101-118-222.ec2.internal
styleguide styleguide-cmd-3772371803-hsg0w 1/1 Unknown 1 9d 100.96.33.37 ip-10-101-118-222.ec2.internal
styleguide-staging styleguide-staging-cmd-83554885-cb2pq 1/1 Unknown 1 9d 100.96.33.39 ip-10-101-118-222.ec2.internal
wellbot wellbot-web-2518992024-1bvml 1/1 Unknown 1 9d 100.96.33.46 ip-10-101-118-222.ec2.internal
I think these tickets are related: https://github.com/kubernetes/kubernetes/issues/42164 https://github.com/kubernetes/kubernetes/issues/39028
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 20 (12 by maintainers)
Commits related to this issue
- Merge pull request #239 from rhobs/automated-updates-master [bot] Bump openshift/prometheus-operator to v0.67.0 — committed to machine424/prometheus-operator by openshift-merge-robot a year ago
Prometheus 2.0 stable has been released, and the Prometheus Operator fully supports Prometheus 2.0. I will close this issue here. Feel free to open new issues regarding Prometheus 2.0. The issue described in this post is fundamentally not solvable with Prometheus 1.x therefore we recommend switching to 2.0.