rancher: Node selector for monitoring doesn't work in some case

What kind of request is this (question/bug/enhancement/feature request):

Bug

Steps to reproduce (least amount of steps as possible):

I upgraded from 2.1.7 to 2.2.0 and enabled monitoring on a cluster.

Result:

Monitoring API unavailable.

This error is constantly spammed on the exporter-kube-state-cluster-monitoring logs:

E0327 13:32:13.962468 1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/builder.go:508: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:cattle-prometheus:exporter-kube-state-cluster-monitoring" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope

Other details that may be helpful:

image

image

Environment information

Single install, v2.2.0. Cluster 1.13, Calico with isolation and some extra rules.

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported):
  • Machine type (cloud/VM/metal) and specifications (CPU/memory):
  • Kubernetes version (use kubectl version):

Custom type, Metal, more than enough resources

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

  • Docker version (use docker version):
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:18 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:22:21 2018
  OS/Arch:          linux/amd64
  Experimental:     false

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 16 (8 by maintainers)

Most upvoted comments

@thxCode I increased the memory limit from the default 500MB to 2GB and now Prometheus doesn’t die. It seems it really needed more (great work with the metrics integration in the UI btw!):

image

@kapolos , I think NodeSelector should accept any kinds of character except = as the label name, so it’s still a bug.

The problem now is that while the metrics API becomes available, prometheus-cluster-monitoring dies shortly after: … The last error log about accessing Alertmanager is fine, as you don’t enable Rancher Alerting at first.