rook: ceph mgr attempts to bind to the wrong pod IP for prometheus+dashboard
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: The rook-ceph-mgr cannot start the prometheus or dashboard endpoints, as they are attempting to bind to a different IP address than what the pod is currently running as.
Expected behavior: Bind to the correct pod IP
How to reproduce it (minimal and precise): Not sure how this situation happened. I had to resolve an unrelated issue with non-existent snapshots preventing the deleting of blocks. @travisn and I were troubleshooting my rook-ceph-mgr constantly restarting and it was due to to the livenessProbe failing. This probe is failing because of this issue.
File(s) to submit:
- Cluster CR (custom resource), typically called
cluster.yaml, if necessary - Operator’s logs, if necessary
- Crashing pod(s) logs, if necessary
The Pod IP is NOT 10.0.0.3 . I believe this used to be the mgr IP until the deployment spun up a new pod.
debug 2021-05-03T19:36:53.550+0000 7f3a01938200 1 mgr[py] Loading python module 'status'
debug 2021-05-03T19:36:53.713+0000 7f3a01938200 1 mgr[py] Loading python module 'telegraf'
debug 2021-05-03T19:36:53.817+0000 7f3a01938200 1 mgr[py] Loading python module 'telemetry'
debug 2021-05-03T19:36:54.537+0000 7f3a01938200 1 mgr[py] Loading python module 'test_orchestrator'
debug 2021-05-03T19:36:54.837+0000 7f3a01938200 1 mgr[py] Loading python module 'volumes'
debug 2021-05-03T19:36:55.237+0000 7f3a01938200 1 mgr[py] Loading python module 'zabbix'
debug 2021-05-03T19:36:55.413+0000 7f39eebad700 0 ms_deliver_dispatch: unhandled message 0x5600e71186e0 mon_map magic: 0 v1 from mon.1 v2:10.43.254.127:3300/0
debug 2021-05-03T19:36:55.487+0000 7f39eebad700 1 mgr handle_mgr_map Activating!
debug 2021-05-03T19:36:55.487+0000 7f39eebad700 1 mgr handle_mgr_map I am now activating
debug 2021-05-03T19:36:55.510+0000 7f39a1614700 0 [balancer DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.510+0000 7f39a1614700 1 mgr load Constructed class from module: balancer
debug 2021-05-03T19:36:55.510+0000 7f39a1614700 0 [crash DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.510+0000 7f39a1614700 1 mgr load Constructed class from module: crash
debug 2021-05-03T19:36:55.513+0000 7f39a1614700 0 [devicehealth DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.513+0000 7f39a1614700 1 mgr load Constructed class from module: devicehealth
debug 2021-05-03T19:36:55.517+0000 7f39a1614700 0 [iostat DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.517+0000 7f39a1614700 1 mgr load Constructed class from module: iostat
debug 2021-05-03T19:36:55.517+0000 7f39a1614700 0 [orchestrator DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.517+0000 7f39a1614700 1 mgr load Constructed class from module: orchestrator
debug 2021-05-03T19:36:55.517+0000 7f39a1614700 0 [pg_autoscaler DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.517+0000 7f39a1614700 1 mgr load Constructed class from module: pg_autoscaler
debug 2021-05-03T19:36:55.523+0000 7f39a1614700 0 [progress DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.523+0000 7f39a1614700 1 mgr load Constructed class from module: progress
debug 2021-05-03T19:36:55.533+0000 7f39a1614700 0 [prometheus DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.533+0000 7f39a1614700 1 mgr load Constructed class from module: prometheus
debug 2021-05-03T19:36:55.540+0000 7f39a1614700 0 [rbd_support DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
[03/May/2021:19:36:55] ENGINE Bus STARTING
debug 2021-05-03T19:36:55.650+0000 7f39a1614700 1 mgr load Constructed class from module: rbd_support
debug 2021-05-03T19:36:55.650+0000 7f39a1614700 0 [restful DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.650+0000 7f39a1614700 1 mgr load Constructed class from module: restful
debug 2021-05-03T19:36:55.653+0000 7f39a1614700 0 [status DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.653+0000 7f39a1614700 1 mgr load Constructed class from module: status
debug 2021-05-03T19:36:55.653+0000 7f39a1614700 0 [telemetry DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.653+0000 7f39a1614700 1 mgr load Constructed class from module: telemetry
debug 2021-05-03T19:36:55.653+0000 7f39c7fcd700 0 [restful WARNING root] server not running: no certificate configured
debug 2021-05-03T19:36:55.657+0000 7f39a1614700 0 [volumes DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
debug 2021-05-03T19:36:55.693+0000 7f39a1614700 1 mgr load Constructed class from module: volumes
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
debug 2021-05-03T19:36:55.697+0000 7f39c1fc1700 -1 client.0 error registering admin socket command: (17) File exists
CherryPy Checker:
The Application mounted at '' has an empty config.
[03/May/2021:19:36:55] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
self.prepare()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Bus STOPPING
[03/May/2021:19:36:55] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('10.0.0.3', 9283)) already shut down
[03/May/2021:19:36:55] ENGINE Bus STOPPED
[03/May/2021:19:36:55] ENGINE Bus EXITING
debug 2021-05-03T19:36:55.720+0000 7f39b8faf700 0 [prometheus ERROR cherrypy.error] [03/May/2021:19:36:55] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
self.prepare()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Bus EXITED
Exception in thread HTTPServer Thread-4:
Traceback (most recent call last):
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
self.prepare()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Error in 'start' listener <bound method Server.start of <cherrypy._cpserver.Server object at 0x7f39d86db9e8>>
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 230, in publish
output.append(listener(*args, **kwargs))
File "/usr/lib/python3.6/site-packages/cherrypy/_cpserver.py", line 180, in start
super(Server, self).start()
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 184, in start
self.wait()
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 246, in wait
raise self.interrupt
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
self.prepare()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 0 [prometheus ERROR cherrypy.error] [03/May/2021:19:36:55] ENGINE Error in 'start' listener <bound method Server.start of <cherrypy._cpserver.Server object at 0x7f39d86db9e8>>
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 230, in publish
output.append(listener(*args, **kwargs))
File "/usr/lib/python3.6/site-packages/cherrypy/_cpserver.py", line 180, in start
super(Server, self).start()
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 184, in start
self.wait()
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 246, in wait
raise self.interrupt
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1836, in start
self.prepare()
File "/usr/lib/python3.6/site-packages/cheroot/server.py", line 1791, in prepare
raise socket.error(msg)
OSError: No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)
[03/May/2021:19:36:55] ENGINE Shutting down due to error in start listener:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 268, in start
self.publish('start')
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 248, in publish
raise exc
cherrypy.process.wspbus.ChannelFailures: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 0 [prometheus ERROR cherrypy.error] [03/May/2021:19:36:55] ENGINE Shutting down due to error in start listener:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 268, in start
self.publish('start')
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 248, in publish
raise exc
cherrypy.process.wspbus.ChannelFailures: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
[03/May/2021:19:36:55] ENGINE Bus STOPPING
[03/May/2021:19:36:55] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('10.0.0.3', 9283)) already shut down
[03/May/2021:19:36:55] ENGINE Bus STOPPED
[03/May/2021:19:36:55] ENGINE Bus EXITING
[03/May/2021:19:36:55] ENGINE Bus EXITED
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'prometheus' while running on mgr.a: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 -1 prometheus.serve:
debug 2021-05-03T19:36:55.820+0000 7f39cf81c700 -1 Traceback (most recent call last):
File "/usr/share/ceph/mgr/prometheus/module.py", line 1280, in serve
cherrypy.engine.start()
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 283, in start
raise e_info
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 268, in start
self.publish('start')
File "/usr/lib/python3.6/site-packages/cherrypy/process/wspbus.py", line 248, in publish
raise exc
cherrypy.process.wspbus.ChannelFailures: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
Environment: RKE, Rook installed from Helm
- OS (e.g. from /etc/os-release): Arch Linux
- Kernel (e.g.
uname -a): 5.10.10 - Cloud provider or hardware configuration: bare metal
- Rook version (use
rook versioninside of a Rook Pod): v1.6.0 - Storage backend version (e.g. for ceph do
ceph -v): v15.2.11 - Kubernetes version (use
kubectl version): v1.19.4 - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Bare Metal
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox): HEALTH_ERR,Module 'prometheus' has failed: OSError("No socket could be created -- (('10.0.0.3', 9283): [Errno 99] Cannot assign requested address)",)
Additionally, for some reason the tools pod reports the wrong rook and ceph version
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (7 by maintainers)
I’ve run into this same issue on an upgrade. I have an error with IP 10.1.88.2, but otherwise this is the same. I believe this was the correct address at one point. Checking the config as requested results in
''and and empty string, but with dump I get the following:Running
ceph config rm mgr.a mgr/prometheus/a/server_addrandceph config rm mgr.a mgr/dashboard/a/server_addrhas solved this issue for me.@travisn: as @sebastian-philipp mentioned, if ‘someone’ set this
mgr/dashboard/a/server_addroption in the past, it’ll remain set. If unset, this will be used instead for guessing the address to bind to.