rook: Dashboard 500. Rook v1.3.0, Ceph 15.2.0

Kubernetes 1.16.8 Centos 7 5.5.13-1.el7.elrepo.x86_64

After upgrade to Rook v1.3.0 and Ceph 15.2.0 the dashboard is partially unavailable and indicates HTTP 500 errors. Logs of manager:

debug 2020-04-09T08:36:13.902+0000 7fa4d6d00700  0 [rook ERROR orchestrator._interface] _Promise failed
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 271, in _finalize
    next_result = self._on_complete(self._value)
  File "/usr/share/ceph/mgr/rook/module.py", line 52, in <lambda>
    return RookCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/usr/share/ceph/mgr/rook/module.py", line 316, in describe_service
    placement=PlacementSpec(count=active),
  File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 338, in __init__
    assert service_type in ServiceSpec.KNOWN_SERVICE_TYPES, service_type
AssertionError: mds.core-rook
debug 2020-04-09T08:36:13.903+0000 7fa4d6d00700  0 [dashboard ERROR request] [10.32.9.136:57600] [GET] [500] [0.414s] [admin] [513.0B] /api/health/minimal
debug 2020-04-09T08:36:13.903+0000 7fa4d6d00700  0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "15acf2e5-68b1-43dd-a460-4a275d164bdf"}                                                                                                                                                                                                                                                                                                                        ']
10.32.7.185 - - [09/Apr/2020:08:36:14] "GET / HTTP/1.1" 200 176 "" "kube-probe/1.16"
debug 2020-04-09T08:36:14.822+0000 7fa4f242a700  0 log_channel(cluster) log [DBG] : pgmap v16509: 97 pgs: 97 active+clean; 76 GiB data, 229 GiB used, 6.3 TiB / 6.5 TiB avail; 1.2 KiB/s rd, 1.6 MiB/s wr, 155 op/s
10.32.11.5 - - [09/Apr/2020:08:36:15] "GET /metrics HTTP/1.1" 200 230652 "" "Prometheus/2.16.0"
debug 2020-04-09T08:36:16.823+0000 7fa4f242a700  0 log_channel(cluster) log [DBG] : pgmap v16510: 97 pgs: 97 active+clean; 76 GiB data, 229 GiB used, 6.3 TiB / 6.5 TiB avail; 852 B/s rd, 1.0 MiB/s wr, 99 op/s
debug 2020-04-09T08:36:18.823+0000 7fa4f242a700  0 log_channel(cluster) log [DBG] : pgmap v16511: 97 pgs: 97 active+clean; 76 GiB data, 229 GiB used, 6.3 TiB / 6.5 TiB avail; 853 B/s rd, 1.0 MiB/s wr, 99 op/s
debug 2020-04-09T08:36:19.170+0000 7fa4d7d02700  0 [rook ERROR orchestrator._interface] _Promise failed
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 271, in _finalize
    next_result = self._on_complete(self._value)
  File "/usr/share/ceph/mgr/rook/module.py", line 52, in <lambda>
    return RookCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/usr/share/ceph/mgr/rook/module.py", line 316, in describe_service
    placement=PlacementSpec(count=active),
  File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 338, in __init__
    assert service_type in ServiceSpec.KNOWN_SERVICE_TYPES, service_type
AssertionError: mds.core-rook
debug 2020-04-09T08:36:19.171+0000 7fa4d7d02700  0 [dashboard ERROR request] [10.32.9.136:57850] [GET] [500] [0.679s] [admin] [513.0B] /api/health/minimal
debug 2020-04-09T08:36:19.171+0000 7fa4d7d02700  0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "c56bd8b8-786e-405c-91c4-d82a04cae23b"}

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 10
  • Comments: 30 (11 by maintainers)

Most upvoted comments

Updated ceph to 15.2.2 but still getting this issue:

debug 2020-05-19T18:18:46.581+0000 7f3e44647700  0 [dashboard ERROR request] [10.11.5.38:50518] [GET] [500] [0.397s] [admin] [513.0B] /api/health/minimal
debug 2020-05-19T18:18:46.581+0000 7f3e44647700  0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "62ddb7cb-07cb-487e-9c3a-a45e5947db28"}
                                                                                                                 ']
debug 2020-05-19T18:19:11.986+0000 7f3e44647700  0 [dashboard ERROR request] [10.11.5.38:50518] [GET] [500] [0.437s] [admin] [513.0B] /api/health/full
debug 2020-05-19T18:19:11.986+0000 7f3e44647700  0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "bcc1fc9b-4d17-4336-8022-57f6e7fe21a4"}
                                                                                                                 ']

FWIW, this is the PR that fixes it: https://github.com/ceph/ceph/pull/34061

The minimal health API of the Dashboard gets iSCSI services from orchestrator, which eventually invoke orchestrator’s describe_services() function.

In 15.2.0, describe_services() asserts because it lists mds services with mds.<namespace> service type. A PR was merged a few days ago should fix this issue.

Saw the same issue, and the steps in this comment got rid of the errors. If you need iSCSI in the dashboard, however, the PR is probably going to be a better option.

Ceph v15.2.4 / v15.2.4-20200630 Docker images are out, could you please check if they fix the MGR dashboard issue for you? Thanks!

If you enable dashboard debug mode you’ll get more information about the exact failure (e.g.: python traceback):

$ ceph dashboard debug enable

Same for me!