cri-o: CRI-O service is not restarting after restart of the node

Description

Steps to reproduce the issue:

  1. Simply restart the node e.g. sudo shutdown -r now (assuming that the service is enabled)
  2. Once the node is up again check status of crio service e.g. systemctl status crio

Describe the results you received: After restarting the node I see the service stopped:

$ systemctl status crio
● crio.service - Container Runtime Interface for OCI (CRI-O)
   Loaded: loaded (/usr/local/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2021-01-07 14:24:42 CET; 1min 28s ago
     Docs: https://github.com/cri-o/cri-o
  Process: 1465 ExecStart=/usr/local/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 1465 (code=exited, status=1/FAILURE)
Warning: crio.service changed on disk. Run 'systemctl daemon-reload' to reload units.

Describe the results you expected: After restarting the node I would expect the service to be running:

$ systemctl status crio -l
● crio.service - Container Runtime Interface for OCI (CRI-O)
   Loaded: loaded (/usr/local/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-01-07 14:35:07 CET; 1min 38s ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 10048 (crio)
    Tasks: 18
   Memory: 16.4M
   CGroup: /system.slice/crio.service
           └─10048 /usr/local/bin/crio

Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.826637013+01:00" level=info msg="Node configuration value for pid cgroup is true"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.826849043+01:00" level=info msg="Node configuration value for memoryswap cgroup is true"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.832173774+01:00" level=info msg="Node configuration value for systemd CollectMode is false"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.957767063+01:00" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.963866586+01:00" level=info msg="Conmon does support the --sync option"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.964158396+01:00" level=info msg="No seccomp profile specified, using the internal default"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.964176075+01:00" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Jan 07 14:35:06 node-name crio[10048]: time="2021-01-07 14:35:06.965633801+01:00" level=info msg="Update default CNI network name to "
Jan 07 14:35:07 node-name crio[10048]: time="2021-01-07 14:35:07.040800881+01:00" level=info msg="Serving metrics on :5555"
Jan 07 14:35:07 node-name systemd[1]: Started Container Runtime Interface for OCI (CRI-O).

Additional information you deem important (e.g. issue happens only occasionally):

Output of crio --version:

$ crio version
Version:       1.19.1
GitCommit:     unknown
GitTreeState:  unknown
BuildDate:     2021-01-07T13:08:58Z
GoVersion:     go1.15.5
Compiler:      gc
Platform:      linux/amd64
Linkmode:      dynamic

Additional environment details (AWS, VirtualBox, physical, etc.):

This is a RedHat7 node physical:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 32 (29 by maintainers)

Most upvoted comments

people are used to work with some common tools e.g. Docker so they do not want to move out of them

In fact Docker needs this knob set as much as CRI-O (and, as far as I remember, dockerd just sets this knob upon start – that might explain why you see it enabled).

This setting (fs.may_detach_mounts) is performed via a file supplied by runc rpm.

@thanos1983 what does

rpm -q runc
grep detach /usr/lib/sysctl.d/* /etc/sysctl.d/* /etc/sysctl.conf
systemctl status systemd-sysctl

tells you?

You might be using a version of runc which does not have a sysctl file. In this case you can add one manually:

echo "fs.may_detach_mounts=1" | sudo tee /usr/lib/sysctl.d/99-containers.conf

PS I found out that centos build for runc from https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_7/ repo does not set fs.may_detach_mounts, which might be the reason for what you see. Not sure how to file a bug for that repo – @lsm5 can you please take a look? cri-o now deliberately checks that this is set and refuses to start otherwise (for the full story and the motivation behind it, see #4217).