kaniko: Can't use kaniko with alpine v3.12+ due to /var/run behavior
Actual behavior
Tekton is using Kaniko to build a Docker image from alpine and recently the builds started failing.
TL;DR
The alpine:3.12 image has /var/run aliased to /run. When running kaniko in a kubernetes pod with service accounts, the serviceaccounts often seem to end up mounted to /var/run.
Kaniko is ignoring the contents and state of /var/run in the base image (alpine:3.12) but unfortunately some details of alpine seem to depend on /var/run being a symlink to /run, and so not preserving that is causing upgrading alpine packages to fail.
Details
We discovered this in https://github.com/tektoncd/pipeline/issues/2738.
It seems the problem is caused by recent versions of alpine-baselayout in alpine3.12. When we build from alpine 3.12 and upgrade all alpine packages, the alpine-baselayout upgrade fails:
(1/1) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..data': Read-only file system
Expected behavior
Kaniko should detect that /var/run is a symlink in the base image and preserve that. (I think! I’m not sure if it’s that simple.)
To Reproduce
Using this dockerfile and mounting a file into /var/run, I can build with docker but not with Kaniko.
Trying to build with kaniko:
docker run -v `pwd`:/workspace/go/src/github.com/tektoncd/pipeline -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/go/src/github.com/tektoncd/pipeline/SECRET.json gcr.io/kaniko-project/executor:v0.17.1 --dockerfile=/workspace/go/src/github.com/tektoncd/pipeline/images/Dockerfile --destination=gcr.io/christiewilson-catfactory/pipeline-release-test --context=/workspace/go/src/github.com/tektoncd/pipeline -v debug
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/SECRET.json': Resource busy
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
The error above about not being able to remove the file seems to come from https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade which works just fine if /var/run is a symlink to /run, which I discovered by trying to do the same thing by using the alpine image directly without kaniko:
docker run --entrypoint /bin/ash -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro alpine:3.12 -c "apk update && apk upgrade alpine-baselayout"
That works just fine!
I tried not whitelisting /var/run and that didn’t work either:
docker run -v `pwd`:/workspace/go/src/github.com/tektoncd/pipeline -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/go/src/github.com/tektoncd/pipeline/SECRET.json gcr.io/kaniko-project/executor:v0.17.1 --dockerfile=/workspace/go/src/github.com/tektoncd/pipeline/images/Dockerfile --destination=gcr.io/christiewilson-catfactory/pipeline-release-test --context=/workspace/go/src/github.com/tektoncd/pipeline --whitelist-var-run=false -v debug
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/SECRET.json: device or resource busy
Finally, using docker to build the image (from the pipelines repo checkout) worked just fine:
pipeline git:(pin_to_stable_alpine) ✗ pwd
/Users/christiewilson/Code/go/src/github.com/tektoncd/pipeline
pipeline git:(pin_to_stable_alpine) ✗ docker build -t poop -f ./images/Dockerfile .
Sending build context to Docker daemon 150.9MB
Step 1/2 : FROM alpine:3.12
---> a24bb4013296
Step 2/2 : RUN apk add --update git openssh-client && apk update && apk upgrade alpine-baselayout
---> Using cache
---> ff08e33b783d
Successfully built ff08e33b783d
Successfully tagged poop:latest
Additional Information
- Dockerfile: https://github.com/tektoncd/pipeline/blob/717fbc51dcf70fb75528925a1031d94d5eb8bb2a/images/Dockerfile
- Build Context: n/a
- Kaniko Image (fully qualified with digest) gcr.io/kaniko-project/executor:v0.17.1 @ 970d32fa1eb2 but also v0.23.0
Triage Notes for the Maintainers
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 33
- Comments: 41 (8 by maintainers)
Commits related to this issue
- fix: alpine upgrade Due to this bug https://github.com/GoogleContainerTools/kaniko/issues/1297 — committed to coreweave/samba by ChandonPierre 2 years ago
- fix: disable alpine package upgrade Due to this bug https://github.com/GoogleContainerTools/kaniko/issues/1297 — committed to coreweave/samba by ChandonPierre 2 years ago
- fix: omit alpine update packages Due to this bug GoogleContainerTools/kaniko#1297 — committed to GlobalFishingWatch/frontend by rdgfuentes 2 years ago
- Kanikobuild: increase alpine compatibility alpine container build fails after apk upgrade, seems to be a problem with /var/run which is described in https://github.com/GoogleContainerTools/kaniko/iss... — committed to devfbe/gipgee by devfbe 2 years ago
- Trying something weird from Github https://github.com/GoogleContainerTools/kaniko/issues/1297#issuecomment-1149054291 — committed to alt4/docker-blrevive by alt4 2 years ago
- fix: disable alpine package upgrade Due to this bug https://github.com/GoogleContainerTools/kaniko/issues/1297 — committed to coreweave/samba by ChandonPierre 2 years ago
- fix: disable alpine package upgrade Due to this bug https://github.com/GoogleContainerTools/kaniko/issues/1297 — committed to coreweave/samba by ChandonPierre 2 years ago
- feat: Bump to Samba 4.16.4 (#6) * refactor(ci): Use Todie spec * feat: bump to 4.16.4 * fix: disable alpine package upgrade Due to this bug https://github.com/GoogleContainerTools/kaniko/iss... — committed to coreweave/samba by ChandonPierre 2 years ago
I believe there is an issue with the pre upgrade script in for the alpine-baselayout package: https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade#n18
The script is erroneously detecting that /var/run is a directory when it is already a symlink. I have filed an issue with the alpine project: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13917
Update:
I have filed a merge request to fix the pre upgrade script: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/35151
Update 2: The merge request I filed has been merged into prod, but it did not solve the issue. Just a coincidence that the alpine test was erroneously detecting that /var/run was a directory when it was actually a symlink. It appears that kaniko is overloading /var/run during the docker build, and it actually is a directory.
update 3: I have found a very dumb work around… I just mv /var in my gitlab-ci.yml file like this:
This wouldn’t work on Kubernetes that mounts the service account secret automatically under /var/run right? I tried with
--no-scriptsor--no-commit-hooksbut it doesn’t help either.We’ve just hit this issue with alpine:latest, which is currently the same as alpine 3, 3.16, and 3.16.0:
Another solution is to not mount the service account token automatically: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server Probably you don’t need the token.
GitLab has a feature request to add this as an option: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4786 And without the mounted token there is no
/var/run/secrets/kubernetes.io/serviceaccountdirectory and therefore no problem.One particularly quick fix is :
apk upgrade --no-cache --ignore alpine-baselayout. Though be warned, apk explicitly says that partial upgrades aren’t supported (but at least you can test).Exactly
This would probably solve quite a bunch of
COPYissue at onceWouldn’t it be easier for kaniko to extract images and run commands in a separate root context ? How difficult would it be to implement?
Bump
This is also causing a problem building a custom image based on nginx alpine image and building it with kaniko.
In the nginx alpine image, nginx.conf uses pid file location
/var/run/nginx.pidIf I build a custom image off nginx alpine but want it to run as non-root user, I need to create an empty file
/var/run/nginx.pidin the image and set the ownership of this file to the non-root user:This works fine when building with docker:
COPY --chown=nginx:nginx nginx.pid /var/run/nginx.pidHowever, it doesn’t work when using kaniko because this mount issue causes any file I put in /var/run to be deleted.
Workaround is to change pid path in nginx.conf.
same here
If we choose this approach for every command, i,e map “/” to another directory in “tmp”, then i see 2 issues.
Runcommand uses commands installed in paths relative to “/” how would that work.Another approach would be to map “/” to “/tmp/kanikoRootXXX” at the beginning of the build. (which is probably what you are suggesting in the edit) I think that could work but we need to do something like this for all the Metadata commands like “ENV”, “WORKDIR”. Also for all the base images, we need to map their
ImageConfig.Envpaths to be relative to this new chroot.I don’t think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.
Another hack would be to not fail if
apk upgradefails due to error " ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk…"Say, you create an upgrade script
apk-upgrade.sh+1 upgrading in Gitlab pipeline using kaniko only works with
apk upgrade --ignore alpine-baselayout.I’m also affected by this issue while updating core packages of an alpine image 😞 . Is there any update on this or any known workarounds?
Also asking about the timeline for a fix. Any updates?