rancher: prometheus-cluster-monitoring is never restarted on new node on shutdown

What kind of request is this (question/bug/enhancement/feature request): Bug

Steps to reproduce (least amount of steps as possible):

  1. Create 3 node cluster via RKE (weave CNI, RancherOS 1.5.1)
  2. Install Rancher HA via helm
  3. Enable monitoring (use default settings except 2GB ram for prometheus)
  4. Shut down the node running the prometheus-cluster-monitoring-0 pod

Result: prometheus-cluster-monitoring-0 pod is never restarted on another node. Only way to get monitoring back is to restore whatever node prometheus-cluster-monitoring-0 ran on.

Other details that may be helpful:

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.2-rc1
  • Installation option (single install/HA): HA
  • Rancher OS 1.5.1

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): RKE (0.2.0) create and HA Rancher installed via Helm
  • Machine type (cloud/VM/metal) and specifications (CPU/memory): 3x ESXi vm with 8GB ram an 2 cpus.
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T22:29:25Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version (use docker version):
$ docker version                                                                                                                                        
Client:                                                                                                                                                                          
 Version:           18.06.1-ce                                                                                                                                                   
 API version:       1.38                                                                                                                                                         
 Go version:        go1.10.3                                                                                                                                                     
 Git commit:        e68fc7a                                                                                                                                                      
 Built:             Tue Aug 21 17:20:43 2018                                                                                                                                     
 OS/Arch:           linux/amd64                                                                                                                                                  
 Experimental:      false                                                                                                                                                        
                                                                                                                                                                                 
Server:                                                                                                                                                                          
 Engine:                                                                                                                                                                         
  Version:          18.06.1-ce                                                                                                                                                   
  API version:      1.38 (minimum version 1.12)                                                                                                                                  
  Go version:       go1.10.3                                                                                                                                                     
  Git commit:       e68fc7a                                                                                                                                                      
  Built:            Tue Aug 21 17:28:38 2018                                                                                                                                     
  OS/Arch:          linux/amd64                                                                                                                                                  
  Experimental:     false

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 2
  • Comments: 16 (2 by maintainers)

Most upvoted comments

It looks like it is by k8s design for Statefulsets, would it be possible to use a deployment rather than statefulset for this? Loosing some monitoring data (and possible risk some inconsistent prom data) is better than not having it at all in case of a failure. https://github.com/kubernetes/kubernetes/issues/74947

@alena1108 Tested. It will get rescheduled when the shutdown node gets removed.