rook: OSDs crashlooping after being OOMKilled: bind unable to bind
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: Updated ceph from v14.2.2-20190722 to v14.2.4-20190917, which seems to have made some changes in memory management, and nodes started getting system OOMKilles followed by OSDs crashlooping.
2019-09-26 01:20:10.118 7f70aa104dc0 -1 Falling back to public interface
2019-09-26 01:20:10.128 7f70aa104dc0 -1 Processor -- bind unable to bind to v2:10.244.15.18:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-09-26 01:20:10.128 7f70aa104dc0 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-09-26 01:20:15.137 7f70aa104dc0 -1 Processor -- bind unable to bind to v2:10.244.15.18:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-09-26 01:20:15.137 7f70aa104dc0 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
2019-09-26 01:20:20.144 7f70aa104dc0 -1 Processor -- bind unable to bind to v2:10.244.15.18:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-09-26 01:20:20.144 7f70aa104dc0 -1 Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
How to reproduce it (minimal and precise): Get an OSD OOMKilled by system
- Rook version (use
rook versioninside of a Rook Pod): 1.1.1 - Storage backend version (e.g. for ceph do
ceph -v): v14.2.4-20190917 - Kubernetes version (use
kubectl version): 1.15.1
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 54 (54 by maintainers)
Present in 1.1.2: https://github.com/rook/rook/releases/tag/v1.1.2