rancher: prometheus-cluster-monitoring is never restarted on new node on shutdown
What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible):
- Create 3 node cluster via RKE (weave CNI, RancherOS 1.5.1)
- Install Rancher HA via helm
- Enable monitoring (use default settings except 2GB ram for prometheus)
- Shut down the node running the prometheus-cluster-monitoring-0 pod
Result: prometheus-cluster-monitoring-0 pod is never restarted on another node. Only way to get monitoring back is to restore whatever node prometheus-cluster-monitoring-0 ran on.
Other details that may be helpful:
Environment information
- Rancher version (
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): v2.2.2-rc1 - Installation option (single install/HA): HA
- Rancher OS 1.5.1
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): RKE (0.2.0) create and HA Rancher installed via Helm
- Machine type (cloud/VM/metal) and specifications (CPU/memory): 3x ESXi vm with 8GB ram an 2 cpus.
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T22:29:25Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
- Docker version (use
docker version
):
$ docker version
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:20:43 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:28:38 2018
OS/Arch: linux/amd64
Experimental: false
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 2
- Comments: 16 (2 by maintainers)
It looks like it is by k8s design for Statefulsets, would it be possible to use a deployment rather than statefulset for this? Loosing some monitoring data (and possible risk some inconsistent prom data) is better than not having it at all in case of a failure. https://github.com/kubernetes/kubernetes/issues/74947
@alena1108 Tested. It will get rescheduled when the shutdown node gets removed.