moby: ISSUE: Can't get systemd to run with 1.11

I’ve been running a few hundred containers with systemd in them since 1.7. The flags required has changed a little bit. In 1.10 I was adding --cap-add=SYS_ADMIN, --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro, and --security-opt=seccomp:unconfined.

With the same flags, it doesn’t work in 1.11.

With --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --privileged it works.

With --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --security-opt=seccomp:unconfined it does not work.

Here’s a dump of the system:

root@Ubuntu-1510-wily-64-minimal ~ # docker info
Containers: 102
 Running: 75
 Paused: 0
 Stopped: 27
Images: 59
Server Version: 1.11.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 352
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 4.2.0-35-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 125.9 GiB
Name: Ubuntu-1510-wily-64-minimal
ID: L6PF:6LTG:FHIZ:NBPC:CJSO:XXQ3:7KIV:ZVQF:C7LA:3NNG:XU3C:O6OT
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 243
 Goroutines: 431
 System Time: 2016-04-25T04:35:08.881928707+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
root@Ubuntu-1510-wily-64-minimal ~ # docker version
Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:38:59 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:38:59 2016
 OS/Arch:      linux/amd64
root@Ubuntu-1510-wily-64-minimal ~ # uname -a
Linux Ubuntu-1510-wily-64-minimal 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Thanks for the help!

FYI: This is the only thing I could find about the issue while Googling, and it suggests something indeed did change in 1.11: https://trello.com/c/RFUcI1eV/158-3-make-docker-systemd-cgroups-driver-work-in-1-11

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 26 (15 by maintainers)

Most upvoted comments

we’re seeing this as well on 1.11,--security-opts=seccomp:unconfined --cap-add=SYS_ADMIN works under 1.10, but not on 1.11. i suspect this might indeed be AppArmor related, as it seems to work on Fedora 23, but not in the Docker-on-Mac beta, which I think uses an Ubuntu bhyve guest?

@thaJeztah it’s been made abundantly clear what Docker’s perspective is on multi-process containers, but I’d like to share our use case, in the interest of providing at least a single data-point on the types of reasons users may be interested in running systemd in Docker containers. Hopefully it helps get past this “you’re doing it wrong” attitude i see so often in docker bug-reports 😄

We (Treehouse) offer a feature to our students called “Workspaces”, which is online code editor and terminal that our students use to work on projects associated with their courses. Each Workspace is spun up as an on-demand docker container running the backing services that the frontend code-editor talks to, with persistence handled by bind-mounting gluster volumes into the container. The services that make up an active Workspace include things like:

  • posix file api web-service
  • web based terminal for interacting with the CLI (compiling css from sass, bundling gems, compiling C# & Java, etc)
  • apache server for previewing static files and PHP
  • postfix for sending mail to local user accounts (outbound mail’s restricted of course…)

we use docker’s dynamic port mapping to expose these services (and other common dev-preview ports for e.g. flask, etc) on the host, and inject the routes into Redis for our load-balancer.

because these are docker containers, we’re able to run anywhere from 100-200 Workspaces on a given host, which is awesome. Having to do this in actual VMs would be cost-prohibitive, so Docker’s worked really well for us in that regard.

With experience, we’ve found that treating each active Workspace as a single container is optimal for several reasons:

  • far fewer API calls, which increases the load we can put on a given docker daemon instance, since we have seen issues with lock contention making the API unresponsive under high-frequency API calls. this is far less of a problem with 100-200 containers than it is with 1000+, which would be necessary for multi-container (container-per-backend process) Workspaces
  • conceptual and operational simplicity: 1 container == 1 workspace, and if there’s a problem, you just nuke the one container. it also avoids the need for complicated linking strategies.
  • related to the above, there’s no “orphan” containers; running each backing service in its own container means that some services may stay running long after the rest (and it) has been shut down. due to docker api latency, some shutdown requests may not complete, so “container leakage” is a real problem. with a workspace == container model, docker ps tells us definitively whether a workspace is ‘active’ or not, so shutdowns can just be retried for expired workspaces if they fail at first.
  • zombie procs are still a real problem, so you need a pid 1 in the container to handle them anyways; we initially used start.sh entrypoints, and found systemd to be much more effective, as well as handling service-restarts as needed so we can delegate to systemd to ensure all of a Workspaces backing-services are restarted if they crash
  • we occasionally get bit by https://github.com/docker/docker/issues/17691, and the failure case in the single-container model is a lot better, since it means the entire workspace won’t launch, as opposed to launching in a degraded state due to a conflict with just one of the backing services

there’s probably some other things i’m forgetting, but those are the big ones for us at present. ultimately, we hope Docker can see that there are some legitimate use cases for multi-process containers, though it’s clear that best-practice for most use cases is still the single-process container model.

thanks for reading, and thanks for a great product! we love Docker, and hope we can keep using it well into the future!

@justincormack

# ls -la /sys/fs/cgroup/systemd/
total 0
dr-xr-xr-x   5 root root   0 May  7 17:53 .
drwxr-xr-x  12 root root 320 Apr 23 20:57 ..
-rw-r--r--   1 root root   0 May  2 11:37 cgroup.clone_children
-rw-r--r--   1 root root   0 May  2 11:37 cgroup.procs
-r--r--r--   1 root root   0 May  2 11:37 cgroup.sane_behavior
drwxr-xr-x 111 root root   0 Apr 23 20:59 docker
-rw-r--r--   1 root root   0 May  2 11:37 notify_on_release
-rw-r--r--   1 root root   0 May  2 11:37 release_agent
drwxr-xr-x  54 root root   0 Apr 23 20:57 system.slice
-rw-r--r--   1 root root   0 May  2 11:37 tasks
drwxr-xr-x   3 root root   0 Apr 23 20:57 user.slice

So, it exists. Still:

root@Ubuntu-1510-wily-64-minimal ~/tmp # cat Dockerfile
FROM ubuntu:16.04

RUN apt-get update

RUN apt-get install openssh-server -y
RUN systemctl enable ssh

ENTRYPOINT ["/lib/systemd/systemd"]
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker build --tag=test .; docker run -d --security-opt=seccomp:unconfined --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> 8cc8fa33e927
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> ccff4008d4bd
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> c0c577808e65
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> ae50e3d5066a
Successfully built ae50e3d5066a
e18080496140313189463d96e5c6bd3ba32c42e4cd5bc30eb415f51cb9c99774
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker exec -it e18080496140313189463d96e5c6bd3ba32c42e4cd5bc30eb415f51cb9c99774 /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  5.5  0.0  36536  2088 ?        Ss   15:54   0:00 /lib/systemd/systemd
root         7  0.0  0.0  34424  2944 ?        Rs+  15:54   0:00 ps aux

Testing your suggested approach (note the rw):

root@Ubuntu-1510-wily-64-minimal ~/tmp # docker build --tag=test .; docker run -d --security-opt=seccomp:unconfined --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:rw test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> 8cc8fa33e927
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> ccff4008d4bd
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> c0c577808e65
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> ae50e3d5066a
Successfully built ae50e3d5066a
237502db4d53863551b10ea7c6676940e47ec958a522c566b18ec0977099cf05
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker exec -it 237502db4d53863551b10ea7c6676940e47ec958a522c566b18ec0977099cf05 /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  5.2  0.0  36536  2124 ?        Ss   15:55   0:00 /lib/systemd/sy
root         7  0.0  0.0  34424  2936 ?        Rs+  15:55   0:00 ps aux

That said, I think there is something funky with the host system. On one of my machines it actually works:

root@m3182:~/tmp# docker build --tag=test .; docker run -d --privileged -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> dbe2316d1c8b
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> 3a36e4f78434
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> 139635dffc97
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> a15fb9bfe596
Successfully built a15fb9bfe596
50ee3ce8fef0cb3078c5c4229ee67718714e61baaba126036d74bf5508465d6d
root@m3182:~/tmp# docker exec -it 50ee3ce8fef0cb3078c5c4229ee67718714e61baaba126036d74bf5508465d6d /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  1.3  0.0  36992  4964 ?        Ss   15:57   0:00 /lib/systemd/sy
root        19  0.6  0.0  35276  7632 ?        Ss   15:57   0:00 /lib/systemd/sy
systemd+    30  0.0  0.0 100324  2564 ?        Ssl  15:57   0:00 /lib/systemd/sy
root        36  0.1  0.0  65612  6312 ?        Ss   15:57   0:00 /usr/sbin/sshd
root        43  0.0  0.0  13028  1840 tty2     Ss+  15:57   0:00 /sbin/agetty --
root        46  0.8  0.0   4508  1728 ?        S    15:57   0:00 /bin/sh /etc/in
root        47  0.0  0.0  13028  1840 tty3     Ss+  15:57   0:00 /sbin/agetty --
root        48  0.0  0.0  13028  1788 tty4     Ss+  15:57   0:00 /sbin/agetty --
root        49  0.0  0.0  13028  1816 tty5     Ss+  15:57   0:00 /sbin/agetty --
root        50  0.0  0.0  13028  1840 tty6     Ss+  15:57   0:00 /sbin/agetty --
root        62  0.0  0.0   4380   800 ?        S    15:57   0:00 sleep 60
root        70  0.0  0.0  34424  2796 ?        Rs+  15:57   0:00 ps aux
root@m3182:~/tmp# docker info
Containers: 57
 Running: 42
 Paused: 0
 Stopped: 15
Images: 23
Server Version: 1.11.1
Storage Driver: devicemapper
 Pool Name: docker-8:2-9699477-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 32.21 GB
 Backing Filesystem: ext4
 Data file: /dev/loop2
 Metadata file: /dev/loop3
 Data Space Used: 49.62 GB
 Data Space Total: 429.5 GB
 Data Space Available: 379.9 GB
 Metadata Space Used: 34.17 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.113 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.99 (2015-06-20)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 3.19.8-031908-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 125.8 GiB
Name: m3182.contabo.host
ID: PZ4G:F6LY:5J7E:7QRJ:CJ7A:5U6D:W6P2:NSDC:INKM:DMJI:UEUL:PF44
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 146
 Goroutines: 270
 System Time: 2016-05-07T17:57:50.872828164+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
root@m3182:~/tmp# docker version
Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:38:55 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:38:55 2016
 OS/Arch:      linux/amd64

The issue is easy for me to reproduce. Just let me know what info you need about the two different host environments. Here are some basics:

Broken host environment:

root@Ubuntu-1510-wily-64-minimal ~/tmp # lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
Stepping:              4
CPU MHz:               3599.941
CPU max MHz:           3900.0000
CPU min MHz:           1200.0000
BogoMIPS:              7000.57
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11
root@Ubuntu-1510-wily-64-minimal ~/tmp # uname -a
Linux Ubuntu-1510-wily-64-minimal 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@Ubuntu-1510-wily-64-minimal ~/tmp # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

Working host environment:

root@m3182:~/tmp# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1213.781
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4801.52
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23
root@m3182:~/tmp# uname -a
Linux m3182.contabo.host 3.19.8-031908-generic #201505110938 SMP Mon May 11 13:39:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@m3182:~/tmp# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

Kernel issue?

/beetree