rancher: prometheus-cluster-monitoring is never restarted on new node on shutdown

What kind of request is this (question/bug/enhancement/feature request): Bug

Steps to reproduce (least amount of steps as possible):

Create 3 node cluster via RKE (weave CNI, RancherOS 1.5.1)
Install Rancher HA via helm
Enable monitoring (use default settings except 2GB ram for prometheus)
Shut down the node running the prometheus-cluster-monitoring-0 pod

Result: prometheus-cluster-monitoring-0 pod is never restarted on another node. Only way to get monitoring back is to restore whatever node prometheus-cluster-monitoring-0 ran on.

Other details that may be helpful:

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.2-rc1
Installation option (single install/HA): HA
Rancher OS 1.5.1

Cluster information

Cluster type (Hosted/Infrastructure Provider/Custom/Imported): RKE (0.2.0) create and HA Rancher installed via Helm
Machine type (cloud/VM/metal) and specifications (CPU/memory): 3x ESXi vm with 8GB ram an 2 cpus.
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T22:29:25Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Docker version (use docker version):

$ docker version                                                                                                                                        
Client:                                                                                                                                                                          
 Version:           18.06.1-ce                                                                                                                                                   
 API version:       1.38                                                                                                                                                         
 Go version:        go1.10.3                                                                                                                                                     
 Git commit:        e68fc7a                                                                                                                                                      
 Built:             Tue Aug 21 17:20:43 2018                                                                                                                                     
 OS/Arch:           linux/amd64                                                                                                                                                  
 Experimental:      false                                                                                                                                                        
                                                                                                                                                                                 
Server:                                                                                                                                                                          
 Engine:                                                                                                                                                                         
  Version:          18.06.1-ce                                                                                                                                                   
  API version:      1.38 (minimum version 1.12)                                                                                                                                  
  Go version:       go1.10.3                                                                                                                                                     
  Git commit:       e68fc7a                                                                                                                                                      
  Built:            Tue Aug 21 17:28:38 2018                                                                                                                                     
  OS/Arch:          linux/amd64                                                                                                                                                  
  Experimental:     false

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 2
Comments: 16 (2 by maintainers)

Most upvoted comments

It looks like it is by k8s design for Statefulsets, would it be possible to use a deployment rather than statefulset for this? Loosing some monitoring data (and possible risk some inconsistent prom data) is better than not having it at all in case of a failure. https://github.com/kubernetes/kubernetes/issues/74947

riaan53 on Apr 4, 2019

@alena1108 Tested. It will get rescheduled when the shutdown node gets removed.

loganhz on Apr 16, 2019