sysbox: Docker build inside sysbox container results in "lchown ... no such file or directory" errors
Hi there,
I am attempting to solve our CI/CD woes using sysbox and I was really excited to have it working, until it didn’t.
Using a dotnet restore with dotnetcore image inside a docker build is failing with a very generic message:
Error processing tar file(exit status 1): lchown /tmp/clr-debug-pipe-202-24216845-in: no such file or directory
see here and here for more information
I am able to solve this by adding COMPlus_EnableDiagnostics=0 as an ENV in the Dockerfile or by passing it from docker-compose and using ARG in Dockerfile. However, I really don’t want to have to alter a ton of Dockerfiles for a bunch of microservices, and I don’t want to have to disable debugging, which is what that flag does.
How to reproduce: create a Dockerfile using mcr.microsoft.com/dotnet/core/sdk:3.1-buster image and then either pull a dotnet repo that does a dotnet restore in the Dockerfile
Things I have tried:
-
running on normal Docker/non-dind = works as intended
-
running on dind using privileged flag and mounting /lib/var/docker as a volume and running nested = works
-
running with sysbox as runtime and:
- added cap_add - ALL to first docker-compose = fails
- added cap_add - ALL to inner docker-compose = fails
I was able to do an strace on both the docker daemon when using standard docker and then using dind with sysbox, here are a few snippets
standard:
-mknodat(AT_FDCWD, "/tmp/clr-debug-pipe-78-63381236-in", S_IFIFO|0700) = 0
-fchownat(AT_FDCWD, "/tmp/clr-debug-pipe-78-63381236-in", 0, 0, AT_SYMLINK_NOFOLLOW) = 0
-fchmodat(AT_FDCWD, "/tmp/clr-debug-pipe-78-63381236-in", 0700) = 0
-utimensat(AT_FDCWD, "/tmp/clr-debug-pipe-78-63381236-in", [{tv_sec=1610522665, tv_nsec=0} /* 2021-01-13T07:24:25+0000 */, {tv_sec=1610522665, tv_nsec=0} /* 2021-01-13T07:24:25+0000 */], 0) = 0
sysbox:
-newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/ca60dd45565e9b2b10754f95f7058ff401485ccca62253f28a522f105018c9b2-init/merged/tmp/clr-debug-pipe-225-63286646-in", 0xc00192e6b8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory) -newfstatat(AT_FDCWD,
"/var/lib/docker/overlay2/ca60dd45565e9b2b10754f95f7058ff401485ccca62253f28a522f105018c9b2/merged/tmp/clr-debug-pipe-225-63286646-in", {st_mode=S_IFIFO|0700, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
-lgetxattr("/var/lib/docker/overlay2/ca60dd45565e9b2b10754f95f7058ff401485ccca62253f28a522f105018c9b2/merged/tmp/clr-debug-pipe-225-63286646-in", "security.capability", 0xc00192a700, 128) = -1 ENODATA (No data available)
-newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/ca60dd45565e9b2b10754f95f7058ff401485ccca62253f28a522f105018c9b2-init/merged/tmp/clr-debug-pipe-225-63286646-out", 0xc00192e858, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
As you can see, it doesn’t seem to be able to run any of the syscalls like mknodat, fchmodat, etc. Which is why I was hoping adding the cap_add would solve this. Both containers are running as root.
Any help on this would be much appreciated!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 34 (17 by maintainers)
Hi @nudgegoonies, had to dig a bit to get to the bottom of this one, but I think I’ve found the reason for the problem.
First, I reproduced the problem by launching a sysbox container (with the
nestybox/ubuntu-focal-systemd-dockerimage), and inside of it launching the docker CLI and docker daemon containers as follows:Then, from the inner Docker CLI container, I pulled the oracle database container image you mentioned above.
The pull failed with:
I then straced the
docker pulloperation, I found that the failure occurs in thefchownat()syscall below:Basically, it looks like this image requires
/dev/initctl; as a result, Docker is looking for/dev/initctlduring the image extraction but this device does not exist within the ephemeral docker container (spawned inside thedindcontainer) where the extraction is taking place.I then repeated the experiment by running the same commands above, but this time at host level (i.e., not inside the sysbox container). Interestingly, this time things worked. I straced the docker daemon, I found the following:
Notice the difference: docker called
mknodon/dev/initctlduring the image extraction. As a result, the subsequentfchownat()worked fine.So why did Docker not call
mknodwhen thedindimage run inside the sysbox container, but did call it when thedindimage run on the host?Looking at the Docker code, it appears the answer is here:
Since Sysbox containers always use the Linux user-namespace (for strong isolation), the Docker daemon running inside the inner dind container is refusing to use
mknodto create the/dev/initctldevice required by the Oracle image. As a result, the subsequentfchownat()fails.This explains the failure. It’s really caused by Docker’s assumption that within a user-ns mknod is not allowed. This is generally true, but does not take into account that container runtimes like Sysbox (or LXD for example) can deal properly with such operations by virtue of intercepting the mknod syscall, examining if it’s allowed, and if so handling it on behalf of the container. Thus, it would be better if Docker had called
mknod()and if it failed, optionally check if it’s running in userns().As far as a solution, I don’t have a good one right now. The only work-around I found was to use the
docker:19.03.2-dindimage instead of thedocker:20.10.2-dindimage (which suggests the Docker source code check for userns I copied above must have been recently added).I’ll think if there is some other solution to make this work with
docker:20.10.2-dind.Closing due to inactivity; please re-open if problem re-occurs.
This image is also affected by just pulling within a dind: gitlab/gitlab-ee:13.12.6-ee.0
Hi @nudgegoonies ,
No it won’t unfortunately. Sysbox always creates containers with the Linux user-namespace (for strong isolation), regardless of whether shiftfs is present or not. Thus, the inner Docker will refuse to mknod() and the
docker pullof the oracle database container imagewill fail.The presence of shiftfs in the kernel is complementary to user-ns: if present in the kernel, it means Docker can continue to create the container’s filesystem with host
root:rootownership, yet the container will have access to it even though the container’s root user is not the host’s root user (by virtue of Sysbox using the user-namespace). Without shiftfs, Docker needs to create the container’s filesystem with ownership that matches the container’s root user (i.e., Docker must be configured in userns-remap mode).Hope that helps.