rook: Ceph MGR: 2 modules failed on default install
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
Two Ceph MGR modules failed to come up causing the Ceph cluster to report HEALTH_ERROR state.
See logs: https://gist.github.com/galexrt/3626102e96dddcef071060b71d94e280
Expected behavior: The dashboard and prometheus modules to work fine.
How to reproduce it (minimal and precise):
- Use the example
cluster.yaml, in my case in a minikube environment on K8S1.11.4.
Environment:
- OS (e.g. from /etc/os-release): ``` NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:centos:centos:7” HOME_URL=“https://www.centos.org/” BUG_REPORT_URL=“https://bugs.centos.org/”
CENTOS_MANTISBT_PROJECT=“CentOS-7” CENTOS_MANTISBT_PROJECT_VERSION=“7” REDHAT_SUPPORT_PRODUCT=“centos” REDHAT_SUPPORT_PRODUCT_VERSION=“7”
* Kernel (e.g. `uname -a`): `Linux minikube 4.15.0 #1 SMP Fri Oct 5 20:44:14 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux`
* Cloud provider or hardware configuration:
* Rook version (use `rook version` inside of a Rook Pod): `rook: v0.8.0-350.g18b2da5f` (freshly built from latest `master` this morning, https://github.com/rook/rook/commit/18b2da5fc5d7a303b9a48119ce55108b55af7f0e)
* Kubernetes version (use `kubectl version`): ```
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.4", GitCommit:"bf9a868e8ea3d3a8fa53cbb22f566771b3f8068b", GitTreeState:"clean", BuildDate:"2018-10-25T19:06:30Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
minikube - Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):HEALTH_ERROR-2 modules have failed
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 3
- Comments: 26 (12 by maintainers)
Ok, I believe I have prototyped the fix… The setting on the mgr modules for
server_addrneeds to be set to the pod IP. By default the dashboard and prometheus modules are binding to::(all interfaces) as seen here, which is causing the issues in the k8s clusters.The fix can be tested by running the following commands from the toolbox:
Now to automate this when the mgr pod starts up…
Hi, If somebody still interested, I had similar dashboard issue for mimic. Mon nodes run mgr roles in mimic. Apparently there are some IPv6 dependency for dashboard module, and IPv6 should be enable on mom nodes during installation/configuration. I usually disable IPv6 and as result run into dashboard and Prometheus issues. To fix an issues you need to:
@liejuntao001 as far as I understand from the community, we’ll need to wait for ceph to publish docker version 13.2.3 in order to solve the prometheus & ceph dashboard bugs in mimic. they suppose to publish it around Jan 2019.
Had the same problem multiple times after starting over with a fresh setup.
Try this as a workaround:
I’ve seen that, but my common setup has been, you know, CentOS7 + custom Luminous backport of the dashboard.
Just run a search and found that this was also happening in Luminous 12.2.5’s Prometheus module and dashboard too. But it was mostly fixed with https://github.com/ceph/ceph/pull/15588.
In the past I was able to work around this issue by setting the listening IP to a specific local address, instead of the default
0.0.0.0or::/128.With
masterI can see a similar error with the restful module (I forced that by immediately disabling and enabling the dashboard module):And finally, https://github.com/ceph/ceph/pull/24734 was merged into 13.2.3. Do you see the same behavior with 13.2.3?