moby: Unattended upgrades that reload apparmor profiles break Engine 1.13, requiring a restart

(If you’re in a hurry, scroll down to “Steps to reproduce the issue”.)

Symptoms

This happened twice to me already: I spin up a 5-nodes Ubuntu 16.04 cluster on EC2, do some work on it; then the day after, I login again, and the cluster seems functional, except that I can’t start new containers. Existing containers seem fine, but any attempt at starting a new container results in:

$ docker run -ti alpine sh
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
Digest: sha256:1354db23ff5478120c980eca1611a51c9f2b88b61f24283ee8200bf9a54f2e5c
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"apparmor failed to apply profile: no such file or directory\"".

The “interesting” part is that this behavior is consistent across the whole cluster; i.e. I all my nodes appear to be botched.

Running a privileged container, however, works fine. What could it be?!? 🔎 🪲

Restarting the Engine (or, more drastically, rebooting the node) fixes the issue.

Analysis

I dynamically enabled debug (through config file + SIGHUP), and looked into the Engine’s logs when trying to start a container; but nothing special was appearing. (If I compare a “sane” Engine and a broken one, the first different line is the one mentioning the apparmor error.)

The kernel logs indicate that the docker-default profile doesn’t exist anymore:

[96551.680251] audit: type=1400 audit(1480734916.438:79): apparmor="DENIED" operation="change_onexec" info="label not found" error=-2 profile="unconfined" name="docker-default" pid=9524 comm="runc:[2:INIT]"

This prompted me to try to start a container with --security-opt apparmor=unconfined, and that worked! Same if I specify another “OK-ish” AppArmor profile, e.g. --security-opt apparmor=lxc-container-default.

We’re getting closer.

Looking at the kernel logs, I saw a blast of mysterious (i.e. unsolicited) messages like the following:

Dec  2 07:41:17 ubuntu kernel: [26112.997698] audit: type=1400 audit(1480664477.757:17): apparmor="STATUS" operation="profile_
replace" profile="unconfined" name="/sbin/dhclient" pid=15189 comm="apparmor_parser"
Dec  2 07:41:17 ubuntu kernel: [26112.998057] audit: type=1400 audit(1480664477.757:18): apparmor="STATUS" operation="profile_
replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=15189 comm="apparmor_parser"
Dec  2 07:41:17 ubuntu kernel: [26112.998338] audit: type=1400 audit(1480664477.757:19): apparmor="STATUS" operation="profile_
replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=15189 comm="apparmor_parser"
...

What could be happening at 7:41 to cause a merry-go-round of apparmor profiles?

I looked at FBI_Surveillance_Van data to figure out if I had been sleepwalking to the console and issued random commands to my nodes, but found nothing.

However, according to the logs, dpkg has been busy at that time:

dpkg.log:2016-12-02 07:41:16 status installed vim-tiny:amd64 2:7.4.1689-3ubuntu1.2  
dpkg.log:2016-12-02 07:41:17 configure linux-headers-4.4.0-51:all 4.4.0-51.72 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked linux-headers-4.4.0-51:all 4.4.0-51.72  
dpkg.log:2016-12-02 07:41:17 status half-configured linux-headers-4.4.0-51:all 4.4.0-51.72  
dpkg.log:2016-12-02 07:41:17 status installed linux-headers-4.4.0-51:all 4.4.0-51.72  
dpkg.log:2016-12-02 07:41:17 configure linux-headers-4.4.0-51-generic:amd64 4.4.0-51.72 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked linux-headers-4.4.0-51-generic:amd64 4.4.0-51.72  
dpkg.log:2016-12-02 07:41:17 status half-configured linux-headers-4.4.0-51-generic:amd64 4.4.0-51.72  
dpkg.log:2016-12-02 07:41:17 status installed linux-headers-4.4.0-51-generic:amd64 4.4.0-51.72  
dpkg.log:2016-12-02 07:41:17 configure linux-image-virtual:amd64 4.4.0.51.54 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked linux-image-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status half-configured linux-image-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status installed linux-image-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 configure linux-headers-generic:amd64 4.4.0.51.54 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked linux-headers-generic:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status half-configured linux-headers-generic:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status installed linux-headers-generic:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 configure linux-headers-virtual:amd64 4.4.0.51.54 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked linux-headers-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status half-configured linux-headers-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status installed linux-headers-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 configure linux-virtual:amd64 4.4.0.51.54 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked linux-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status half-configured linux-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 status installed linux-virtual:amd64 4.4.0.51.54  
dpkg.log:2016-12-02 07:41:17 configure python3-cryptography:amd64 1.2.3-1ubuntu0.1 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked python3-cryptography:amd64 1.2.3-1ubuntu0.1  
dpkg.log:2016-12-02 07:41:17 status half-configured python3-cryptography:amd64 1.2.3-1ubuntu0.1  
dpkg.log:2016-12-02 07:41:17 status installed python3-cryptography:amd64 1.2.3-1ubuntu0.1  
dpkg.log:2016-12-02 07:41:17 configure liblxc1:amd64 2.0.5-0ubuntu1~ubuntu16.04.3 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked liblxc1:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status half-configured liblxc1:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status installed liblxc1:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 configure lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3 <none>  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status unpacked lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status half-configured lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 status installed lxc-common:amd64 2.0.5-0ubuntu1~ubuntu16.04.3  
dpkg.log:2016-12-02 07:41:17 trigproc libc-bin:amd64 2.23-0ubuntu4 <none>  
dpkg.log:2016-12-02 07:41:17 status half-configured libc-bin:amd64 2.23-0ubuntu4  
dpkg.log:2016-12-02 07:41:17 status installed libc-bin:amd64 2.23-0ubuntu4  
dpkg.log:2016-12-02 07:41:18 startup packages configure  

I suspect that unattended upgrades have pulled some package that included AppArmor profiles, and as a result, all AppArmor profiles have been flushed and reloaded — discarding the docker-default profile in the process.

Specifically, I suspect ~colonel Mustard, in the INT gate, with the task_struct~ lxc-common. (But really, I think anything pulling AppArmor profiles will do!)

Steps to reproduce the issue:

This gives us the following steps to reproduce (on Ubuntu 16.04 with 1.13-rc2):

  1. Install Docker on Ubuntu 16.04
  2. Run your favorite container (e.g. docker run alpine echo ♥)
  3. Move fast and break all the things with sudo dpkg-reconfigure lxc-common
  4. Run your favorite container again

Describe the results you received:

I get one ♥, and then the following error message:

container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"apparmor failed to apply profile: no such file or directory\""
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"apparmor failed to apply profile: no such file or directory\"".

Describe the results you expected:

I expected two ♥

Additional information you deem important (e.g. issue happens only occasionally):

Issue will randomly break your Engine subrepticiously¹ with undercover, randomly-timed, unattended upgrades.

¹Did I just make up that word? 📖 👀

Output of docker version:

Client:
 Version:      1.13.0-rc2
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   1f9b3ef
 Built:        Wed Nov 23 06:32:39 2016
 OS/Arch:      linux/amd64

Server:
 Version:             1.13.0-rc2
 API version:         1.25
 Minimum API version: 1.12
 Go version:          go1.7.3
 Git commit:          1f9b3ef
 Built:               Wed Nov 23 06:32:39 2016
 OS/Arch:             linux/amd64
 Experimental:        false

Output of docker info:

Containers: 13
 Running: 0
 Paused: 0
 Stopped: 13
Images: 3
Server Version: 1.13.0-rc2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 45
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: b0vdva852jomk6kbvlklh23qc
 Is Manager: true
 ClusterID: bianshirja3n5k5qrt4jmvtng
 Managers: 3
 Nodes: 5
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 172.31.42.62
 Manager Addresses:
  0.0.0.0:2377
  172.31.42.58:2377
  172.31.42.60:2377
  172.31.42.62:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-47-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.858 GiB
Name: node2
ID: EFCV:WAAA:U4AO:7Q5A:NLIS:EXZ5:LHSB:G5FB:37ZK:ELVE:7MQ2:NUQW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
 provider=generic
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

I can provide unlimited supplies of broken instances on demand.

I can confirm that the problem only shows up on 1.13; not on 1.12.

I can’t confirm or deny whether this is related to #26823.

I can also provide 🍪 or suitably adequate bribes if that gets fixed before my workshop next week (or even if it doesn’t, because one should never pass on an opportunity to show 💖 to our benevolent maintainers, may their path be sowed with one thousand rose petals).

Thank you ✨

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 16 (10 by maintainers)

Most upvoted comments

I had exactly same issue a few days ago, docker version

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Running on

Ubuntu 16.04.2 LTS (Xenial Xerus)
Linux 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

It happened on all cluster nodes, just with same update of lxc packages.

For those who encounter same problem - no need to restart containers, just run a new container (any container) on each node, and it loads the apparmor profile.