rook: ceph mgr attempts to bind to the wrong pod IP for prometheus+dashboard

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior: The rook-ceph-mgr cannot start the prometheus or dashboard endpoints, as they are attempting to bind to a different IP address than what the pod is currently running as.

Expected behavior: Bind to the correct pod IP

How to reproduce it (minimal and precise): Not sure how this situation happened. I had to resolve an unrelated issue with non-existent snapshots preventing the deleting of blocks. @travisn and I were troubleshooting my rook-ceph-mgr constantly restarting and it was due to to the livenessProbe failing. This probe is failing because of this issue.

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator’s logs, if necessary
  • Crashing pod(s) logs, if necessary

The Pod IP is NOT 10.0.0.3 . I believe this used to be the mgr IP until the deployment spun up a new pod.

debug 2021-05-03T19:36:53.550+0000 7f3a01938200  1 mgr[py] Loading python module 'status'
debug 2021-05-03T19:36:53.713+0000 7f3a01938200  1 mgr[py] Loading python module 'telegraf'
debug 2021-05-03T19:36:53.817+0000 7f3a01938200  1 mgr[py] Loading python module 'telemetry'
debug 2021-05-03T19:36:54.537+0000 7f3a01938200  1 mgr[py] Loading python module 'test_orchestrator'
debug 2021-05-03T19:36:54.837+0000 7f3a01938200  1 mgr[py] Loading python module 'volumes'
debug 2021-05-03T19:36:55.237+0000 7f3a01938200  1 mgr[py] Loading python module 'zabbix'
debug 2021-05-03T19:36:55.413+0000 7f39eebad700  0 ms_deliver_dispatch: unhandled message 0x5600e71186e0 mon_map magic: 0 v1 from mon.1 v2:10.43.254.127:3300/0
debug 2021-05-03T19:36:55.487+0000 7f39eebad700  1 mgr handle_mgr_map Activating!
debug 2021-05-03T19:36:55.487+0000 7f39eebad700  1 mgr handle_mgr_map I am now activating
debug 2021-05-03T19:36:55.510+0000 7f39a1614700  0 [balancer DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.510+0000 7f39a1614700  1 mgr load Constructed class from module: balancer
debug 2021-05-03T19:36:55.510+0000 7f39a1614700  0 [crash DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.510+0000 7f39a1614700  1 mgr load Constructed class from module: crash
debug 2021-05-03T19:36:55.513+0000 7f39a1614700  0 [devicehealth DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.513+0000 7f39a1614700  1 mgr load Constructed class from module: devicehealth
debug 2021-05-03T19:36:55.517+0000 7f39a1614700  0 [iostat DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.517+0000 7f39a1614700  1 mgr load Constructed class from module: iostat
debug 2021-05-03T19:36:55.517+0000 7f39a1614700  0 [orchestrator DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.517+0000 7f39a1614700  1 mgr load Constructed class from module: orchestrator
debug 2021-05-03T19:36:55.517+0000 7f39a1614700  0 [pg_autoscaler DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.517+0000 7f39a1614700  1 mgr load Constructed class from module: pg_autoscaler
debug 2021-05-03T19:36:55.523+0000 7f39a1614700  0 [progress DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.523+0000 7f39a1614700  1 mgr load Constructed class from module: progress
debug 2021-05-03T19:36:55.533+0000 7f39a1614700  0 [prometheus DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.533+0000 7f39a1614700  1 mgr load Constructed class from module: prometheus
debug 2021-05-03T19:36:55.540+0000 7f39a1614700  0 [rbd_support DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
[03/May/2021:19:36:55] ENGINE Bus STARTING
debug 2021-05-03T19:36:55.650+0000 7f39a1614700  1 mgr load Constructed class from module: rbd_support
debug 2021-05-03T19:36:55.650+0000 7f39a1614700  0 [restful DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.650+0000 7f39a1614700  1 mgr load Constructed class from module: restful
debug 2021-05-03T19:36:55.653+0000 7f39a1614700  0 [status DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.653+0000 7f39a1614700  1 mgr load Constructed class from module: status
debug 2021-05-03T19:36:55.653+0000 7f39a1614700  0 [telemetry DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.653+0000 7f39a1614700  1 mgr load Constructed class from module: telemetry
debug 2021-05-03T19:36:55.653+0000 7f39c7fcd700  0 [restful WARNING root] server not running: no certificate configured
debug 2021-05-03T19:36:55.657+0000 7f39a1614700  0 [volumes DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.693+0000 7f39a1614700  1 mgr load Constructed class from module: volumes
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
CherryPy Checker:
The Application mounted at '' has an empty config.
[03/May/2021:19:36:55] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
    self.prepare()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
    raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Bus STOPPING
[03/May/2021:19:36:55] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('10.0.0.3', 9283)) already shut down
[03/May/2021:19:36:55] ENGINE Bus STOPPED
[03/May/2021:19:36:55] ENGINE Bus EXITING
debug 2021-05-03T19:36:55.720+0000 7f39b8faf700  0 [prometheus ERROR cherrypy.error] [03/May/2021:19:36:55] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
    self.prepare()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
    raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Bus EXITED
Exception in thread HTTPServer Thread-4:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
    self.prepare()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
    raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Error in 'start' listener <bound method Server.start of <cherrypy._cpserver.Server object at 0x7f39d86db9e8>>
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 230, in publish
    output.append(listener(*args, **kwargs))
  File "/usr/lib/python3.6/site-packages/cherrypy/_cpserver.py", line 180, in start
    super(Server, self).start()
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 184, in start
    self.wait()
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 246, in wait
    raise self.interrupt
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
    self.prepare()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
    raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700  0 [prometheus ERROR cherrypy.error] [03/May/2021:19:36:55] ENGINE Error in 'start' listener <bound method Server.start of <cherrypy._cpserver.Server object at 0x7f39d86db9e8>>
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 230, in publish
    output.append(listener(*args, **kwargs))
  File "/usr/lib/python3.6/site-packages/cherrypy/_cpserver.py", line 180, in start
    super(Server, self).start()
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 184, in start
    self.wait()
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 246, in wait
    raise self.interrupt
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
    self.prepare()
  File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
    raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Shutting down due to error in start listener:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 268, in start
    self.publish('start')
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 248, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700  0 [prometheus ERROR cherrypy.error] [03/May/2021:19:36:55] ENGINE Shutting down due to error in start listener:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 268, in start
    self.publish('start')
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 248, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
[03/May/2021:19:36:55] ENGINE Bus STOPPING
[03/May/2021:19:36:55] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('10.0.0.3', 9283)) already shut down
[03/May/2021:19:36:55] ENGINE Bus STOPPED
[03/May/2021:19:36:55] ENGINE Bus EXITING
[03/May/2021:19:36:55] ENGINE Bus EXITED
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'prometheus' while running on mgr.a: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 -1 prometheus.serve:
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1280, in serve
    cherrypy.engine.start()
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 283, in start
    raise e_info
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 268, in start
    self.publish('start')
  File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 248, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)

Environment: RKE, Rook installed from Helm

  • OS (e.g. from /etc/os-release): Arch Linux
  • Kernel (e.g. uname -a): 5.10.10
  • Cloud provider or hardware configuration: bare metal
  • Rook version (use rook version inside of a Rook Pod): v1.6.0
  • Storage backend version (e.g. for ceph do ceph -v): v15.2.11
  • Kubernetes version (use kubectl version): v1.19.4
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Bare Metal
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_ERR, Module 'prometheus' has failed: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)

Additionally, for some reason the tools pod reports the wrong rook and ceph version

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (7 by maintainers)

Most upvoted comments

I’ve run into this same issue on an upgrade. I have an error with IP 10.1.88.2, but otherwise this is the same. I believe this was the correct address at one point. Checking the config as requested results in '' and and empty string, but with dump I get the following:

$ ceph config dump | grep 10.1.88.2
    mgr.a        advanced  mgr/dashboard/a/server_addr   10.1.88.2  *
    mgr.a        advanced  mgr/prometheus/a/server_addr  10.1.88.2  *

Running ceph config rm mgr.a mgr/prometheus/a/server_addr and ceph config rm mgr.a mgr/dashboard/a/server_addr has solved this issue for me.

@travisn: as @sebastian-philipp mentioned, if ‘someone’ set this mgr/dashboard/a/server_addr option in the past, it’ll remain set. If unset, this will be used instead for guessing the address to bind to.