moby: systemd v219 can't run in docker 1.9 due to readonly /sys/fs/cgroup

Running systemd in docker is a long-standing issue, some months ago I surprisedly found I can smoothly run “systemd-container-208” in docker 1.8 with “–cap-add SYS_ADMIN”(I feel this is a little safer than --privileged 😃, but in docker 1.9 this breaks, the cause is “/sys/fs/cgroup” is readonly by default now, I have to use this hack:

$ docker run -dt -e container=docker --cap-add SYS_ADMIN centos:latest bash -c 'mount -oremount,rw /sys/fs/cgroup; mkdir /sys/fs/cgroup/systemd; mount -oremount,ro /sys/fs/cgroup; exec /usr/sbin/init'

I’m not sure what the best solution is, maybe ask systemd developer to try to remount /sys/fs/cgroup before creating /sys/fs/cgroup/systemd? or docker just creates that directory anyway? I see there is systemd support in docker now, but don’t know what it is doing: https://github.com/opencontainers/runc/tree/master/libcontainer/cgroups/systemd

Docker 1.8 (boot2docker v1.8.0):

$ docker run --rm -it -e container=docker centos:latest bash -c 'mount | grep /sys/fs/cgroup'
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_prio)

$ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): darwin/amd64
Server version: 1.8.1
Server API version: 1.20
Go version (server): go1.4.2
Git commit (server): d12ea79
OS/Arch (server): linux/amd64

$ docker info
Containers: 3
Images: 136
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 142
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.0.9-boot2docker
Operating System: Boot2Docker 1.8.1 (TCL 6.3); master : 7f12e95 - Thu Aug 13 03:24:56 UTC 2015
CPUs: 8
Total Memory: 1.955 GiB
Name: boot2docker
ID: 25AA:2PM7:VJDC:2YPU:QTKF:ODD5:HSAQ:EWGV:2XOU:3LHD:5FF4:6DMG
Debug mode (server): true
File Descriptors: 30
Goroutines: 38
System Time: 2015-12-20T02:03:36.73673363Z
EventsListeners: 0
Init SHA1:
Init Path: /usr/local/bin/docker
Docker Root Dir: /mnt/sda1/var/lib/docker

$ uname -a
Darwin localhost 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:46 PDT 2015; root:xnu-3247.10.11~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,5 Darwin

$ boot2docker ssh uname -a
Linux boot2docker 4.0.9-boot2docker #1 SMP Thu Aug 13 03:05:44 UTC 2015 x86_64 GNU/Linux

Docker 1.9 (docker toolbox, docker-machine 0.5.0):

$ docker run --rm -it -e container=docker centos:latest bash -c 'mount | grep /sys/fs/cgroup'
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)

$ docker version
Client:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   76d6bc9
 Built:        Tue Nov  3 19:20:09 UTC 2015
 OS/Arch:      darwin/amd64

Server:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   76d6bc9
 Built:        Tue Nov  3 19:20:09 UTC 2015
 OS/Arch:      linux/amd64

$ docker info
Containers: 3
Images: 10
Server Version: 1.9.0
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 17
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.12-boot2docker
Operating System: Boot2Docker 1.9.0 (TCL 6.4); master : 16e4a2a - Tue Nov  3 19:49:22 UTC 2015
CPUs: 1
Total Memory: 1.956 GiB
Name: default
ID: WDPS:UFA4:DWK6:54U4:JIPX:6K6U:HDMV:MXJU:L4DA:JL7N:4A3A:CFRF
Debug mode (server): true
 File Descriptors: 17
 Goroutines: 28
 System Time: 2015-12-20T02:03:12.399240046Z
 EventsListeners: 0
 Init SHA1:
 Init Path: /usr/local/bin/docker
 Docker Root Dir: /mnt/sda1/var/lib/docker
Labels:
 provider=virtualbox

$ uname -a
Darwin localhost 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:46 PDT 2015; root:xnu-3247.10.11~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,5 Darwin

$ docker-machine ssh default uname -a
Linux default 4.1.12-boot2docker #1 SMP Tue Nov 3 06:03:36 UTC 2015 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 29 (12 by maintainers)

Most upvoted comments

@rhatdan, that did the trick! Thank you.

$ sudo docker run -e container=docker -ti --rm --tmpfs /run --tmpfs /run/lock --tmpfs /tmp -v /sys/fs/cgroup:/sys/fs/cgroup:ro giantmonkey/debian:stretch-amd64 /sbin/init

Any idea, why systemd doesn’t print any start messages.

$ sudo docker exec -i elegant_heisenberg systemctl
UNIT                              LOAD   ACTIVE SUB       DESCRIPTION                                          
-.mount                           loaded active mounted   Root Mount                                           
dev-mqueue.mount                  loaded active mounted   POSIX Message Queue File System                      
etc-hostname.mount                loaded active mounted   /etc/hostname                                        
etc-hosts.mount                   loaded active mounted   /etc/hosts                                           
etc-resolv.conf.mount             loaded active mounted   /etc/resolv.conf                                     
proc-bus.mount                    loaded active mounted   /proc/bus                                            
proc-fs.mount                     loaded active mounted   /proc/fs                                             
proc-irq.mount                    loaded active mounted   /proc/irq                                            
proc-kcore.mount                  loaded active mounted   /proc/kcore                                          
proc-sched_debug.mount            loaded active mounted   /proc/sched_debug                                    
proc-sysrq\x2dtrigger.mount       loaded active mounted   /proc/sysrq-trigger                                  
proc-timer_stats.mount            loaded active mounted   /proc/timer_stats                                    
tmp.mount                         loaded active mounted   /tmp                                                 
systemd-ask-password-console.path loaded active waiting   Dispatch Password Requests to Console Directory Watch
systemd-ask-password-wall.path    loaded active waiting   Forward Password Requests to Wall Directory Watch    
init.scope                        loaded active running   System and Service Manager                           
console-getty.service             loaded active running   Console Getty                                        
cron.service                      loaded active running   Regular background program processing daemon         
rsyslog.service                   loaded active running   System Logging Service                               
systemd-journal-flush.service     loaded active exited    Flush Journal to Persistent Storage                  
systemd-journald.service          loaded active running   Journal Service                                      
systemd-remount-fs.service        loaded active exited    Remount Root and Kernel File Systems                 
systemd-tmpfiles-setup.service    loaded active exited    Create Volatile Files and Directories                
systemd-update-utmp.service       loaded active exited    Update UTMP about System Boot/Shutdown               
systemd-user-sessions.service     loaded active exited    Permit User Sessions                                 
-.slice                           loaded active active    Root Slice                                           
system-getty.slice                loaded active active    system-getty.slice                                   
system.slice                      loaded active active    System Slice                                         
syslog.socket                     loaded active running   Syslog Socket                                        
systemd-initctl.socket            loaded active listening /dev/initctl Compatibility Named Pipe                
systemd-journald-dev-log.socket   loaded active running   Journal Socket (/dev/log)                            
systemd-journald.socket           loaded active running   Journal Socket                                       
basic.target                      loaded active active    Basic System                                         
cryptsetup.target                 loaded active active    Encrypted Volumes                                    
getty.target                      loaded active active    Login Prompts                                        
graphical.target                  loaded active active    Graphical Interface                                  
local-fs-pre.target               loaded active active    Local File Systems (Pre)                             
local-fs.target                   loaded active active    Local File Systems                                   
multi-user.target                 loaded active active    Multi-User System                                    
paths.target                      loaded active active    Paths                                                
remote-fs.target                  loaded active active    Remote File Systems                                  
slices.target                     loaded active active    Slices                                               
sockets.target                    loaded active active    Sockets                                              
swap.target                       loaded active active    Swap                                                 
sysinit.target                    loaded active active    System Initialization                                
time-sync.target                  loaded active active    System Time Synchronized                             
timers.target                     loaded active active    Timers                                               
apt-daily.timer                   loaded active waiting   Daily apt activities                                 
systemd-tmpfiles-clean.timer      loaded active waiting   Daily Cleanup of Temporary Directories               

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

49 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
$ sudo docker exec -i elegant_heisenberg systemctl status
● bdb0eb80bf89
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Sun 2016-11-20 13:01:09 CET; 1min 43s ago
   CGroup: /docker/bdb0eb80bf896f7ef66f2e4154279b2e3f4f08e4e98696537a0f5ca27a880a10
           ├─42 systemctl
           ├─48 pager
           ├─58 systemctl status
           ├─init.scope
           │ └─1 /sbin/init
           └─system.slice
             ├─cron.service
             │ └─21 /usr/sbin/cron -f
             ├─systemd-journald.service
             │ └─17 /lib/systemd/systemd-journald
             ├─console-getty.service
             │ └─57 /sbin/agetty --noclear --keep-baud console 115200,38400,9600 xterm
             └─rsyslog.service
               └─23 /usr/sbin/rsyslogd -n
$ sudo docker exec -i elegant_heisenberg systemd-analyze blame
          9.043s systemd-remount-fs.service
          3.015s systemd-user-sessions.service
          1.010s systemd-update-utmp-runlevel.service
          1.010s systemd-journal-flush.service
          1.009s systemd-update-utmp.service
          1.008s systemd-journald.service
          1.005s rsyslog.service
          1.005s systemd-tmpfiles-setup.service
$ sudo docker exec -i elegant_heisenberg systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @42.229s
└─multi-user.target @41.225s
  └─getty.target @40.220s

Gentlemen, thanks a lot for your valuable comments!

Finally I was able to start Debian 8 systemd container on CentOS 7.2.1511 with active SELinux enforcement (no local modifications needed). I use Docker Engine 1.9.1 and following Dockerfile:

FROM debian:8
MAINTAINER "Humble Me" <me@github.com>

ENV container docker
ENV init /lib/systemd/systemd
ENV LC_ALL C
ENV DEBIAN_FRONTEND noninteractive

RUN echo '# Do not install recommended and suggested packages by default\n\
APT::Install-Recommends "0";\n\
APT::Install-Suggests "0";\n' > /etc/apt/apt.conf.d/docker-skip-recommends-suggests

RUN apt-get update && \
    apt-get -y upgrade && \
    apt-get clean

RUN (cd /lib/systemd/system/sysinit.target.wants/; \
    for i in *; do [ $i = systemd-tmpfiles-setup.service ] || rm -f $i; done); \
    rm -f /lib/systemd/system/multi-user.target.wants/*; \
    rm -f /etc/systemd/system/*.wants/*; \
    rm -f /lib/systemd/system/local-fs.target.wants/*; \
    rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
    rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
    rm -f /usr/lib/tmpfiles.d/systemd-nologin.conf

RUN systemctl set-default multi-user.target

VOLUME [ "/sys/fs/cgroup" ]

ENTRYPOINT ["/lib/systemd/systemd"]

This and only this command works for me:

sudo docker run --rm --cap-add SYS_ADMIN -it -v /run -v /run/lock -v /sys/fs/cgroup:/sys/fs/cgroup:ro local/d8-systemd
systemd 215 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
Detected virtualization 'other'.
Detected architecture 'x86-64'.

Welcome to Debian GNU/Linux 8 (jessie)!

Set hostname to <7d2cfc392db8>.
Failed to install release agent, ignoring: File exists
[  OK  ] Reached target Paths.
[  OK  ] Created slice Root Slice.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Created slice System Slice.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
         Starting Journal Service...
[  OK  ] Started Journal Service.
[  OK  ] Started Create Volatile Files and Directories.
[ INFO ] Update UTMP about System Boot/Shutdown is not active.
[DEPEND] Dependency failed for Update UTMP about System Runlevel Changes.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Timers.
[  OK  ] Reached target Basic System.
         Starting /etc/rc.local Compatibility...
         Starting Cleanup of Temporary Directories...
[  OK  ] Started Cleanup of Temporary Directories.
[  OK  ] Started /etc/rc.local Compatibility.
[  OK  ] Reached target Multi-User System.

Success!

@zart, thanks. That’s what I wrote in my comment too.

Please note, that with the capability SYS_ADMIN you do not need to mount /run and /tmp.

Hi, has there been any progress with integrating the patches?

Just a side – tested with Docker 1.10.3, for Debian 8.3 (jessie), systemd 215-17+deb8u4, the following options are needed.

--cap-add SYS_ADMIN --cap-add SYS_RESOURCE -v /sys/fs/cgroup:/sys/fs/cgroup:ro

Without SYS_RESOURCE D-Bus doesn’t seem to start.

With Debian stretch/testing, systemd 229-3, the following works.

--cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro

In #docker@irc.freenode.net I was told, that there is the environment variable container.

The following is supposed to work with CentOS.

-e container=docker -v /run -v /tmp -v /sys/fs/cgroup:/sys/fs/cgroup:ro

On Debian it fails with the error below.

Failed to mount tmpfs at /run/lock: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
Freezing execution.