cri-o: CRI-O service is not restarting after restart of the node
Description
Steps to reproduce the issue:
- Simply restart the node e.g.
sudo shutdown -r now
(assuming that the service is enabled) - Once the node is up again check status of crio service e.g.
systemctl status crio
Describe the results you received: After restarting the node I see the service stopped:
$ systemctl status crio
● crio.service - Container Runtime Interface for OCI (CRI-O)
Loaded: loaded (/usr/local/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2021-01-07 14:24:42 CET; 1min 28s ago
Docs: https://github.com/cri-o/cri-o
Process: 1465 ExecStart=/usr/local/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 1465 (code=exited, status=1/FAILURE)
Warning: crio.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Describe the results you expected: After restarting the node I would expect the service to be running:
$ systemctl status crio -l
● crio.service - Container Runtime Interface for OCI (CRI-O)
Loaded: loaded (/usr/local/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-01-07 14:35:07 CET; 1min 38s ago
Docs: https://github.com/cri-o/cri-o
Main PID: 10048 (crio)
Tasks: 18
Memory: 16.4M
CGroup: /system.slice/crio.service
└─10048 /usr/local/bin/crio
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.826637013+01:00" level=info msg="Node configuration value for pid cgroup is true"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.826849043+01:00" level=info msg="Node configuration value for memoryswap cgroup is true"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.832173774+01:00" level=info msg="Node configuration value for systemd CollectMode is false"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.957767063+01:00" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.963866586+01:00" level=info msg="Conmon does support the --sync option"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.964158396+01:00" level=info msg="No seccomp profile specified, using the internal default"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.964176075+01:00" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.965633801+01:00" level=info msg="Update default CNI network name to "
Jan 07 14:35:07 node-name crio[10048]: time="2021-01-07 14:35:07.040800881+01:00" level=info msg="Serving metrics on :5555"
Jan 07 14:35:07 node-name systemd[1]: Started Container Runtime Interface for OCI (CRI-O).
Additional information you deem important (e.g. issue happens only occasionally):
Output of crio --version
:
$ crio version
Version: 1.19.1
GitCommit: unknown
GitTreeState: unknown
BuildDate: 2021-01-07T13:08:58Z
GoVersion: go1.15.5
Compiler: gc
Platform: linux/amd64
Linkmode: dynamic
Additional environment details (AWS, VirtualBox, physical, etc.):
This is a RedHat7 node physical:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 32 (29 by maintainers)
In fact Docker needs this knob set as much as CRI-O (and, as far as I remember, dockerd just sets this knob upon start – that might explain why you see it enabled).
This setting (fs.may_detach_mounts) is performed via a file supplied by runc rpm.
@thanos1983 what does
tells you?
You might be using a version of runc which does not have a sysctl file. In this case you can add one manually:
PS I found out that centos build for runc from https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_7/ repo does not set fs.may_detach_mounts, which might be the reason for what you see. Not sure how to file a bug for that repo – @lsm5 can you please take a look? cri-o now deliberately checks that this is set and refuses to start otherwise (for the full story and the motivation behind it, see #4217).