moby: 19.03.0-beta1: could not get XDG_RUNTIME_DIR (dockerd fails to start)

I think this is related to #38050. There’s a minor regression in Docker 19.03.0-beta1 where dockerd fails to start sometimes (for example, like in an environment with a crappy init system such as boot2docker 😅).

The full dockerd daemon log is simply:

could not get XDG_RUNTIME_DIR

(No other context or decoration.)

What’s odd is that if I then do sudo dockerd (or sudo /etc/init.d/docker restart) from the running system (still no XDG_RUNTIME_DIR), the daemon does start up successfully, and I get the following log:

WARN[2019-04-05T23:43:08.090959172Z] Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior  dir=/mnt/sda1/var/lib/docker error="error writing file to signal mount cleanup on shutdown: open /var/run/docker/unmount-on-shutdown: no such file or directory"
INFO[2019-04-05T23:43:08.095138264Z] libcontainerd: started new containerd process  pid=2022
INFO[2019-04-05T23:43:08.096216254Z] parsed scheme: "unix"                         module=grpc
INFO[2019-04-05T23:43:08.097255716Z] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-04-05T23:43:08.099090111Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-04-05T23:43:08.102403587Z] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-04-05T23:43:08.104440977Z] pickfirstBalancer: HandleSubConnStateChange: 0xc000907180, CONNECTING  module=grpc
INFO[2019-04-05T23:43:08.112139160Z] starting containerd                           revision=bb71b10fd8f58240ca47fbb579b9d1028eea7c84 version=v1.2.5
INFO[2019-04-05T23:43:08.113987956Z] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2019-04-05T23:43:08.115299816Z] loading plugin "io.containerd.snapshotter.v1.btrfs"...  type=io.containerd.snapshotter.v1
WARN[2019-04-05T23:43:08.116726388Z] failed to load plugin io.containerd.snapshotter.v1.btrfs  error="path /mnt/sda1/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[2019-04-05T23:43:08.119075104Z] loading plugin "io.containerd.snapshotter.v1.aufs"...  type=io.containerd.snapshotter.v1
WARN[2019-04-05T23:43:08.121777265Z] failed to load plugin io.containerd.snapshotter.v1.aufs  error="modprobe aufs failed: "modprobe: module aufs not found in modules.dep\n": exit status 1"
INFO[2019-04-05T23:43:08.123618491Z] loading plugin "io.containerd.snapshotter.v1.native"...  type=io.containerd.snapshotter.v1
INFO[2019-04-05T23:43:08.124999971Z] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2019-04-05T23:43:08.126379107Z] loading plugin "io.containerd.snapshotter.v1.zfs"...  type=io.containerd.snapshotter.v1
WARN[2019-04-05T23:43:08.127694182Z] failed to load plugin io.containerd.snapshotter.v1.zfs  error="path /mnt/sda1/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
INFO[2019-04-05T23:43:08.130024085Z] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
WARN[2019-04-05T23:43:08.131322823Z] could not use snapshotter zfs in metadata plugin  error="path /mnt/sda1/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[2019-04-05T23:43:08.133547776Z] could not use snapshotter btrfs in metadata plugin  error="path /mnt/sda1/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
WARN[2019-04-05T23:43:08.135778359Z] could not use snapshotter aufs in metadata plugin  error="modprobe aufs failed: "modprobe: module aufs not found in modules.dep\n": exit status 1"
INFO[2019-04-05T23:43:08.144145908Z] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2019-04-05T23:43:08.145448785Z] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2019-04-05T23:43:08.146558838Z] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.147918118Z] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.149332607Z] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.150786735Z] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.152199109Z] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.153600837Z] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.154949926Z] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.156298463Z] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2019-04-05T23:43:08.157625323Z] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
INFO[2019-04-05T23:43:08.158910070Z] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2019-04-05T23:43:08.160393344Z] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2019-04-05T23:43:08.161765553Z] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2019-04-05T23:43:08.163081227Z] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.164348950Z] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.165513999Z] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.166576714Z] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.167807444Z] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.169117759Z] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.170392962Z] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.171652273Z] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.173037925Z] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2019-04-05T23:43:08.174412861Z] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.175730582Z] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.177005749Z] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.178286183Z] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2019-04-05T23:43:08.179525192Z] serving...                                    address="/var/run/docker/containerd/containerd-debug.sock"
INFO[2019-04-05T23:43:08.180787557Z] serving...                                    address="/var/run/docker/containerd/containerd.sock"
INFO[2019-04-05T23:43:08.182128898Z] containerd successfully booted in 0.070259s  
INFO[2019-04-05T23:43:08.183101659Z] pickfirstBalancer: HandleSubConnStateChange: 0xc000907180, READY  module=grpc
INFO[2019-04-05T23:43:18.206935688Z] parsed scheme: "unix"                         module=grpc
INFO[2019-04-05T23:43:18.210937015Z] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-04-05T23:43:18.215314493Z] parsed scheme: "unix"                         module=grpc
INFO[2019-04-05T23:43:18.218095290Z] scheme "unix" not registered, fallback to default scheme  module=grpc
WARN[2019-04-05T23:43:18.244684001Z] Your kernel does not support cgroup blkio weight 
WARN[2019-04-05T23:43:18.246043866Z] Your kernel does not support cgroup blkio weight_device 
INFO[2019-04-05T23:43:18.247584742Z] Loading containers: start.                   
INFO[2019-04-05T23:43:18.293411069Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-04-05T23:43:18.295132306Z] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-04-05T23:43:18.296176657Z] pickfirstBalancer: HandleSubConnStateChange: 0xc0000d2b00, CONNECTING  module=grpc
INFO[2019-04-05T23:43:18.297622715Z] pickfirstBalancer: HandleSubConnStateChange: 0xc0000d2b00, READY  module=grpc
INFO[2019-04-05T23:43:18.298885226Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-04-05T23:43:18.300519142Z] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-04-05T23:43:18.301515716Z] pickfirstBalancer: HandleSubConnStateChange: 0xc0000d2da0, CONNECTING  module=grpc
INFO[2019-04-05T23:43:18.305061025Z] pickfirstBalancer: HandleSubConnStateChange: 0xc0000d2da0, READY  module=grpc
INFO[2019-04-05T23:43:18.325335224Z] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address 
INFO[2019-04-05T23:43:18.352683593Z] Loading containers: done.                    
INFO[2019-04-05T23:43:18.361786810Z] Docker daemon                                 commit=62240a9 graphdriver(s)=overlay2 version=19.03.0-beta1
INFO[2019-04-05T23:43:18.363669299Z] Daemon has completed initialization          
INFO[2019-04-05T23:43:18.398293695Z] API listen on /var/run/docker.sock           

I think that first line is probably the most interesting/worrying although I’m not clear on what it means or if it’s even remotely related to the initial failure that’s got me filing an issue? (honestly if I hadn’t run into the XDG_RUNTIME_DIR failure I wouldn’t even be looking into this, let alone filing an issue 😄)

Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior dir=/mnt/sda1/var/lib/docker error="error writing file to signal mount cleanup on shutdown: open /var/run/docker/unmount-on-shutdown: no such file or directory"

I’m a little bit confused about why it would only look for (and error out on) XDG_RUNTIME_DIR sometimes, but also about why the error output is entirely undecorated.

To be clear, this is a fully clean environment, kernel version 4.14.111, no pre-existing /var/lib/docker contents, and if I swap in 18.09.4 into the ISO instead it works flawlessly.

The init script runs dockerd --data-root /mnt/sda1/var/lib/docker -H unix:// --pidfile /var/run/docker.pid, which doesn’t seem particularly exotic (/mnt/sda1 is a freshly-formatted ext4 partition and /mnt/sda1/var/lib/docker is pre-created via mkdir -p), and does work if I simply run it again after it fails.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

LOL, I can’t stop watching that Gif; it’s so perfect for your comment https://github.com/moby/moby/issues/39009#issuecomment-480459527

😂🤣😂🤣

Confirmed, rebuilt with unset USER right before running dockerd and it starts up without issue.

https://github.com/moby/moby/blob/0ac8cbf74765ca32e1b82df343bdf52ebb0fb6e2/rootless/rootless.go#L13-L26

It appears that “mostly used for configuring default paths” somehow evolved into “used for detecting whether we aren’t root”? 😅

I think the only way (as you mentioned above, and we discussed in the related runc issues which were very similar) is to split up the semantics instead of having a single “is rootless” switch because there is a difference between whether we should use XDG_RUNTIME_DIR and whether we need to work around some privilege issues.

A simple check might be to see whether we have write access to the default root directory – otherwise we fallback to XDG_RUNTIME_DIR (since ultimately the only reason we use XDG_RUNTIME_DIR is because it’s a /run-like directory we can write to – and /run is not writeable by unprivileged users).

Right, sorry – that’s what I meant (root-in-userns). We could just replicate the LXC check (like we have in runc) which will detect whether we’re in a userns.

I agree we shouldn’t rely on USER, but could not come up with a better idea. RFC.