x11docker: Read-only file system (--init=systemd) [cgroupv2 not supported yet]
Hi,
I was trying to run Steam like so:
x11-docker steam --init=systemd --gpu --pulseaudio --home=/home/archbung/.local/share/Steam -V
where steam is a Docker image built using the following Dockerfile:
FROM ubuntu:20.10
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Berlin
# Update and install packages
RUN dpkg --add-architecture i386 \
&& apt-get update -y \
&& apt-get install -y gdebi \
libc6:i386 \
libgl1-mesa-dri:i386 \
libgl1:i386 \
pciutils \
wget \
xdg-desktop-portal \
xdg-desktop-portal-gtk \
xdg-utils \
xterm
WORKDIR /tmp
RUN wget http://media.steampowered.com/client/installer/steam.deb && gdebi -n steam.deb
CMD ["steam"]
However, x11docker terminated with the following error
Welcome to Ubuntu 20.10!
Set hostname to <ba7666b47c2c>.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
Could you give me some tips on troubleshooting this issue? The full x11docker.log can be found here.
Cheers,
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (14 by maintainers)
Commits related to this issue
- --init=systemd: show failure warning for cgroupv2 #349 — committed to mviereck/x11docker by mviereck 3 years ago
- --init=systemd: show workaround for cgroupv2 issue #349 — committed to mviereck/x11docker by mviereck 3 years ago
- --init=systemd --backend=podman: Use --systemd=always #349 — committed to mviereck/x11docker by mviereck 2 years ago
- --init=systemd: cgroupv2 message for podman+crun #349 — committed to mviereck/x11docker by mviereck 2 years ago
- --init=systemd: Support cgroupv2 in docker #349 — committed to mviereck/x11docker by mviereck 2 years ago
- --init=systemd: fix cgroup version check #349 — committed to mviereck/x11docker by mviereck 2 years ago
- --init=systemd: do not enter cgroup namespace/not supported with busybox. Another fix for cgroup version check. #349 — committed to mviereck/x11docker by mviereck 2 years ago
- --init=systemd: check cgroup version with statfs #349 — committed to mviereck/x11docker by mviereck 2 years ago
The recommended way for running systemd in docker seems to be [1][2]:
--privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rwSince I generally do not like to use
=hostoptions I tried to replicate what podman does with the docker cli. It is a bit hacky but it seems to be working. Tested on a headless Debian 11 system with docker.io+runc (container is fedora httpd with systemd). Systemd correctly detects and uses cgroupv2 (default-hierarchy=unified). I did not have time to check how one would integrate this with x11docker.Based on podman container_internal_linux.go.
Initially, I tried using
docker exec --privilegedbut that is not working as one would think [3]. Creating a container withCAP_SYS_ADMINremounting rw and dropping the CAP and then exec is also a problem since AppArmor blocks that. The nsenter method does not grant extra capabilities and also works without having to disable AppArmor. You can also wrap the nsenter command inside adocker run --privileged --pid=host[4].I am not sure why your unprivileged setup fails. If you like to, you could try to run your worker with x11docker and its option
--init=systemdto check if a more secure setup would work.Great that this helped you! However, just want to note that this setup exposes your host to the container and is quite insecure. Don’t use it if there is any reason to distrust the container because basically no isolation is left.
Thanks for the investigation! x11docker now uses
statfor the cgroup version check.Confusing: Other than I assumed my Debian bullseye installation seems to run cgroupv2 only by default (i.e. without kernel options). The nsenter setup succeeds.
If I set kernel option
systemd.unified_cgroup_hierarchy=0to have cgroupv1 only, I seem to get a hybrid system according to check of/sys/fs/cgroup/unified. But the nsenter setup fails in this case. The old setup with shared host cgroups is needed.I don’t know an option to get a real hybrid setup.
So x11docker would need two checks:
--init=systemdon a pure cgroupv1 system. (Not sure if any are out in the wild.)--init=systemdon a real hybrid system.Currently x11docker is configured to use the nsenter setup only on a pure cgroupv2 system. For cgroupv1 and hybrid it falls back to the old behaviour sharing host cgroups.
Good catch!
I’ve almost literally integrated your command in x11docker, works like a charm now. I still have to add
--cap-add=SYS_PTRACE, did you remove it intentionally?--init=systemdworks ootb now in hybrid system and in cgroupv2-only system. It fails yet if I set the (previously recommended) GRUB kernel optionsystemd.unified_cgroup_hierarchy=0. One has to set x11docker option--sharecgroupto enable the old setup.Currently I miss a way to detect if a system is set up with cgroupv1 only although the kernel supports cgroupv2. The check
grep -q cgroup2 /proc/filesystems && Cgroupversion="v2" || Cgroupversion="v1"always results in “v2”.Curious: Debian buster containers still report
default-hierarchy=hybrid(but work nonetheless), while Debian bullseye containers reportdefault-hierarchy=unified.I have just reread the nsenter man page and it might be good to also join the cgroup namespace (-C) in addition to the mount and PID namespace. Even though a remount seems to work since it is atomic if one would instead umount and then mount being in the same cgroup namespace would be required. At least that is how I understand it.
Additionally, instead of joining the host PID NS it is also possible to join the other containers PID NS and therefore no longer needs to use docker inspect.
EDIT: If I apply the same procedure to a rootless podman container that was created with --systemd=false the remount fails with EPERM but doing
after a
podman run ... nsenter -t 1 -m -p -Cstill works. Should not really matter since podman has systemd support built-in but interesting to know.The same does not work with a docker container where the daemon is running with --userns-remap. In this case /sys/fs/cgroup/ is already mounted rw but owned by real root and therefore appears to be owned by nobody from inside the container.
The issue is still present on a minimal untweaked Arch Linux install with podman and crun (cgroupv2 only).
But it can be easily fixed without modifying the grub cmdline. Just passing --systemd=always to podman makes the issue disappear.
Blog post about –systemd=always and podman cgroupv2
Thanks
in
/etc/default/grubfixed the issue