rook: OSD Processor -- bind unable to bind to IP on any port in range 6800-7300: (99) Cannot assign requested address
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
I’m running OpenShift v3.11.82 on 3 nodes using RHEL7 with FLEXVOLUME_DIR_PATH set to /usr/libexec/kubernetes/kubelet-plugins/volume/exec(no other changes to *.yml files). I’ve encountered OSD error “Processor – bind unable to bind to v2:IP:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address” on all my 3 OSD.
Expected behavior:
Bind to address and start to listen on port. rook+ceph should work as expexted.
How to reproduce it (minimal and precise):
Environment:
- OS (e.g. from /etc/os-release):
RHEL7 - Kernel (e.g.
uname -a):Linux node2.system10.vlan124.mcp 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GN - Cloud provider or hardware configuration:
On-prem (VMVare based) - Rook version (use
rook versioninside of a Rook Pod):1.0.0 - Kubernetes version (use
kubectl version):OpenShift v.3.11.82(aka. K8s v1.11) - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
OpenShift - Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):
LOGS
rook-ceph-osd-0
2019-05-08 17:01:38.770518 I | rookcmd: starting Rook v1.0.0-13.g05b0166 with arguments '/rook/rook ceph osd start -- --foreground --id 0 --osd-uuid c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false'
2019-05-08 17:01:38.776349 I | rookcmd: flag values: --help=false, --log-flush-frequency=5s, --log-level=INFO, --osd-id=0, --osd-store-type=bluestore, --osd-uuid=c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e
2019-05-08 17:01:38.776480 I | op-mon: parsing mon endpoints:
2019-05-08 17:01:38.776577 W | op-mon: ignoring invalid monitor
2019-05-08 17:01:38.784163 I | exec: Running command: stdbuf -oL ceph-volume lvm activate --no-systemd --bluestore 0 c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e
2019-05-08 17:01:39.683863 I | Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
2019-05-08 17:01:40.012259 I | Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-0
2019-05-08 17:01:40.312664 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2019-05-08 17:01:40.648273 I | Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-aaa9dfa1-6146-4a9d-9b19-781bf0f71abe/osd-data-06f7e1bb-336c-4b7f-860f-39ce2217ef5c --path /var/lib/ceph/osd/ceph-0 --no-mon-config
2019-05-08 17:01:41.200711 I | Running command: /bin/ln -snf /dev/ceph-aaa9dfa1-6146-4a9d-9b19-781bf0f71abe/osd-data-06f7e1bb-336c-4b7f-860f-39ce2217ef5c /var/lib/ceph/osd/ceph-0/block
2019-05-08 17:01:41.508309 I | Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
2019-05-08 17:01:41.813803 I | Running command: /bin/chown -R ceph:ceph /dev/mapper/ceph--aaa9dfa1--6146--4a9d--9b19--781bf0f71abe-osd--data--06f7e1bb--336c--4b7f--860f--39ce2217ef5c
2019-05-08 17:01:42.097564 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2019-05-08 17:01:42.388105 I | --> ceph-volume lvm activate successful for osd ID: 0
2019-05-08 17:01:42.400127 I | exec: Running command: ceph-osd --foreground --id 0 --osd-uuid c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false
2019-05-08 17:01:43.178290 I | 2019-05-08 17:01:43.177 7fe1fd660d80 -1 Falling back to public interface
2019-05-08 17:01:43.829540 I | 2019-05-08 17:01:43.829 7fe1fd660d80 -1 osd.0 69 log_to_monitors {default=true}
2019-05-08 17:01:43.847126 I | 2019-05-08 17:01:43.846 7fe1efef6700 -1 osd.0 69 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2019-05-08 17:01:54.275215 I | 2019-05-08 17:01:54.274 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:01:54.275251 I | 2019-05-08 17:01:54.274 7fe1e5ee2700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-05-08 17:01:59.291147 I | 2019-05-08 17:01:59.290 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:01:59.291193 I | 2019-05-08 17:01:59.290 7fe1e5ee2700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-05-08 17:02:04.308557 I | 2019-05-08 17:02:04.308 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:04.308596 I | 2019-05-08 17:02:04.308 7fe1e5ee2700 -1 Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
2019-05-08 17:02:04.320883 I | 2019-05-08 17:02:04.320 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:04.320920 I | 2019-05-08 17:02:04.320 7fe1e5ee2700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-05-08 17:02:09.332874 I | 2019-05-08 17:02:09.332 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:09.332911 I | 2019-05-08 17:02:09.332 7fe1e5ee2700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-05-08 17:02:14.345173 I | 2019-05-08 17:02:14.344 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:14.345214 I | 2019-05-08 17:02:14.344 7fe1e5ee2700 -1 Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
2019-05-08 17:02:14.359483 I | 2019-05-08 17:02:14.358 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:14.359524 I | 2019-05-08 17:02:14.358 7fe1e5ee2700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-05-08 17:02:19.372082 I | 2019-05-08 17:02:19.371 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:19.372114 I | 2019-05-08 17:02:19.371 7fe1e5ee2700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-05-08 17:02:24.385045 I | 2019-05-08 17:02:24.384 7fe1e5ee2700 -1 Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-08 17:02:24.385079 I | 2019-05-08 17:02:24.384 7fe1e5ee2700 -1 Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
2019-05-08 17:02:24.385338 I | 2019-05-08 17:02:24.384 7fe1f3786700 -1 received signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2019-05-08 17:02:24.385351 I | 2019-05-08 17:02:24.384 7fe1f3786700 -1 osd.0 74 *** Got signal Interrupt ***
osd-0 config
[global]
fsid = e9f14792-30db-427d-8537-73e5b73a91ac
run dir = /var/lib/rook/osd0
mon initial members = c a b
mon host = v1:172.30.246.255:6789,v1:172.30.152.231:6789,v1:172.30.117.179:6789
public addr = 10.130.0.19
cluster addr = 10.130.0.19
mon keyvaluedb = rocksdb
mon_allow_pool_delete = true
mon_max_pg_per_osd = 1000
debug default = 0
debug rados = 0
debug mon = 0
debug osd = 0
debug bluestore = 0
debug filestore = 0
debug journal = 0
debug leveldb = 0
filestore_omap_backend = rocksdb
osd pg bits = 11
osd pgp bits = 11
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 100
osd pool default pgp num = 100
osd objectstore = filestore
crush location = root=default host=node2-system10-vlan124-mcp
rbd_default_features = 3
fatal signal handlers = false
[osd.0]
keyring = /var/lib/ceph/osd/ceph-0/keyring
bluestore block path = /var/lib/ceph/osd/ceph-0/block
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 52 (23 by maintainers)
Commits related to this issue
- ceph: osd: fix startup on sdn This commit adds a new flag to the osd startup so that on msgr2 (default on Nautilus and above) the osd is able to find the IP address in the container to bind to. This... — committed to leseb/rook by leseb 5 years ago
- ceph: osd: fix startup on sdn This commit adds a new flag to the osd startup so that on msgr2 (default on Nautilus and above) the osd is able to find the IP address in the container to bind to. This... — committed to leseb/rook by leseb 5 years ago
- ceph: osd: fix startup on sdn This commit adds a new flag to the osd startup so that on msgr2 (default on Nautilus and above) the osd is able to find the IP address in the container to bind to. This... — committed to sabbot/rook by leseb 5 years ago
- ceph: osd: fix startup on sdn This commit adds a new flag to the osd startup so that on msgr2 (default on Nautilus and above) the osd is able to find the IP address in the container to bind to. This... — committed to travisn/rook by leseb 5 years ago
@travisn , still getting the error with 14.2.4, as in comment-563671883
For this container image and args:
osd pod logs are:
@travisn It seems that the issue is not a port conflict, but the
ceph-osddaemon picking the wrong IP address.In my scenario above, the config file contained the right value for the pod IP:
but the OSD process, somewhere in pick_address(), decided to use 10.34.0.0 instead.
Also in pick_addresses.cc the code is looking at other config values (like
public_network, for instance), which are not set in the config file passed toceph-osd.Perhaps sprinkling log statements in
pick_address.ccand comparing the outputs ofceph-osdwithhostNetwork: true(where it works) andhostNetwork: falsecould be of help …It works! Thanks a lot, @leseb. I’m able to create an object store, and to read/write data from/to it.
I’m running now with the following configuration (diff from rook v1.0.2):
The resources snipper for
mgris to address a memory leak I reported earlier, nothing to do with this issue.With this setup I see no errors in the logs, and Ceph looks healthy:
Notice the health warning bit “crush map has legacy tunables (require firefly, min is hammer)” — anything to worry about?
@Blaine @leseb Any ideas on this issue with OSDs having port conflicts? #3189 also seems to be the same issue. It smells like we have a configuration issue and we’ll need to backport a fix to 1.0.
In my deployed rook, ceph version is 14.2.4, but the problem is still there:
core@master3 ~ $ /opt/bin/kube/kubectl -n rook-ceph logs po/rook-ceph-osd-0-6b685b5478-nzsnt 2019-12-09 15:11:58.179 7f3478524dc0 -1 Falling back to public interface 2019-12-09 15:11:58.194 7f3478524dc0 -1 Processor – bind unable to bind to v2:10.244.70.5:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address 2019-12-09 15:11:58.194 7f3478524dc0 -1 Processor – bind was unable to bind. Trying again in 5 seconds 2019-12-09 15:12:03.209 7f3478524dc0 -1 Processor – bind unable to bind to v2:10.244.70.5:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address 2019-12-09 15:12:03.210 7f3478524dc0 -1 Processor – bind was unable to bind. Trying again in 5 seconds 2019-12-09 15:12:08.225 7f3478524dc0 -1 Processor – bind unable to bind to v2:10.244.70.5:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address 2019-12-09 15:12:08.225 7f3478524dc0 -1 Processor – bind was unable to bind after 3 attempts: (99) Cannot assign requested address
Set
allowUnsupported: truein your CR.Something like this?
While I’m at it, shall I raise the debug level?
I seem to be having the same problem as you only I have just one single OSD which refuses to start and as I verified with
ceph osd dumpalso has an IP withx.x.x.0, the rest start just fine. However, I’m completely lost on how to start debugging this. The entry of participating has become enormous as I simply don’t know how to create a docker image with a custom Ceph build.