cluster-monitoring-operator: node-exporter does not come up on openshift e2e runs

I switched our prometheus e2e tests to use the cluster monitoring operator and I’m seeing some failures in about 1/4 runs. The most noticeable is that one run didn’t have the node exporter installed (no pods created).

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/20830/pull-ci-origin-e2e-gcp/3161/

/tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/test/extended/prometheus/prometheus.go:49
Expected
    <[]error | len:1, cap:1>: [
        {
            s: "no match for map[job:node-exporter] with health up and scrape URL ^https://.*/metrics$",
        },
    ]
to be empty
/tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/test/extended/prometheus/prometheus.go:123

In this run the e2e tests start at 13:37, but the prometheus test isn’t run until 13:45, which should be more than enough time for node-exporter to come up. I see no pods created, which implies either the daemonset wasn’t created, or the daemonset failed massively. I see no events for the daemonset in https://storage.googleapis.com/origin-ci-test/pr-logs/pull/20830/pull-ci-origin-e2e-gcp/3161/artifacts/e2e-gcp/events.json which implies it didn’t get created.

I see the following in the logs for prometheus operator (which seems bad) but nothing in cluster monitoring operator that is excessive.

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/20830/pull-ci-origin-e2e-gcp/3161/artifacts/e2e-gcp/pods/openshift-monitoring_cluster-monitoring-operator-5cf8fccc6-mdc92_cluster-monitoring-operator.log.gz

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/20830/pull-ci-origin-e2e-gcp/3161/artifacts/e2e-gcp/pods/openshift-monitoring_prometheus-operator-6c9fddd47f-mb4br_prometheus-operator.log.gz

W0903 14:01:42.210505       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
level=info ts=2018-09-03T14:01:43.178075004Z caller=operator.go:732 component=prometheusoperator msg="sync prometheus" key=openshift-monitoring/k8s
W0903 14:01:43.178176       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:01:43.178307       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:01:43.196450       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:01:43.222385       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:01:43.222448       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
level=info ts=2018-09-03T14:01:43.222295876Z caller=operator.go:732 component=prometheusoperator msg="sync prometheus" key=openshift-monitoring/k8s
W0903 14:01:43.240970       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:02:03.033696       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
level=info ts=2018-09-03T14:02:03.033607297Z caller=operator.go:732 component=prometheusoperator msg="sync prometheus" key=openshift-monitoring/k8s
W0903 14:02:03.033767       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:02:03.048325       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
level=info ts=2018-09-03T14:02:19.767518749Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=openshift-monitoring/main
W0903 14:02:45.489186       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0903 14:02:45.489268       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
level=info ts=2018-09-03T14:02:45.489057156Z caller=operator.go:732 component=prometheusoperator msg="sync prometheus" key=openshift-monitoring/k8s
W0903 14:02:45.504357       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments