kaniko: Can't use kaniko with alpine v3.12+ due to /var/run behavior

Actual behavior

Tekton is using Kaniko to build a Docker image from alpine and recently the builds started failing.

TL;DR

The alpine:3.12 image has /var/run aliased to /run. When running kaniko in a kubernetes pod with service accounts, the serviceaccounts often seem to end up mounted to /var/run.

Kaniko is ignoring the contents and state of /var/run in the base image (alpine:3.12) but unfortunately some details of alpine seem to depend on /var/run being a symlink to /run, and so not preserving that is causing upgrading alpine packages to fail.

Details

We discovered this in https://github.com/tektoncd/pipeline/issues/2738.

It seems the problem is caused by recent versions of alpine-baselayout in alpine3.12. When we build from alpine 3.12 and upgrade all alpine packages, the alpine-baselayout upgrade fails:

(1/1) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..data': Read-only file system

Expected behavior

Kaniko should detect that /var/run is a symlink in the base image and preserve that. (I think! I’m not sure if it’s that simple.)

To Reproduce

Using this dockerfile and mounting a file into /var/run, I can build with docker but not with Kaniko.

Trying to build with kaniko:

docker run -v `pwd`:/workspace/go/src/github.com/tektoncd/pipeline -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/go/src/github.com/tektoncd/pipeline/SECRET.json gcr.io/kaniko-project/executor:v0.17.1 --dockerfile=/workspace/go/src/github.com/tektoncd/pipeline/images/Dockerfile --destination=gcr.io/christiewilson-catfactory/pipeline-release-test --context=/workspace/go/src/github.com/tektoncd/pipeline -v debug
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/SECRET.json': Resource busy
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.

The error above about not being able to remove the file seems to come from https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade which works just fine if /var/run is a symlink to /run, which I discovered by trying to do the same thing by using the alpine image directly without kaniko:

docker run --entrypoint /bin/ash -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro alpine:3.12 -c "apk update && apk upgrade alpine-baselayout"

That works just fine!

I tried not whitelisting /var/run and that didn’t work either:

docker run -v `pwd`:/workspace/go/src/github.com/tektoncd/pipeline -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/go/src/github.com/tektoncd/pipeline/SECRET.json gcr.io/kaniko-project/executor:v0.17.1 --dockerfile=/workspace/go/src/github.com/tektoncd/pipeline/images/Dockerfile --destination=gcr.io/christiewilson-catfactory/pipeline-release-test --context=/workspace/go/src/github.com/tektoncd/pipeline --whitelist-var-run=false -v debug
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/SECRET.json: device or resource busy

Finally, using docker to build the image (from the pipelines repo checkout) worked just fine:

pipeline git:(pin_to_stable_alpine) ✗ pwd
/Users/christiewilson/Code/go/src/github.com/tektoncd/pipeline
pipeline git:(pin_to_stable_alpine) ✗ docker build -t poop -f ./images/Dockerfile  .
Sending build context to Docker daemon  150.9MB
Step 1/2 : FROM alpine:3.12
 ---> a24bb4013296
Step 2/2 : RUN apk add --update git openssh-client     && apk update     && apk upgrade alpine-baselayout
 ---> Using cache
 ---> ff08e33b783d
Successfully built ff08e33b783d
Successfully tagged poop:latest

Additional Information

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [x]
Please check if this error is seen when you use --cache flag
  • - [ ]
Please check if your dockerfile is a multistage dockerfile
  • - [ ]

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 33
  • Comments: 41 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I believe there is an issue with the pre upgrade script in for the alpine-baselayout package: https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade#n18

The script is erroneously detecting that /var/run is a directory when it is already a symlink. I have filed an issue with the alpine project: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13917

Update:

I have filed a merge request to fix the pre upgrade script: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/35151

Update 2: The merge request I filed has been merged into prod, but it did not solve the issue. Just a coincidence that the alpine test was erroneously detecting that /var/run was a directory when it was actually a symlink. It appears that kaniko is overloading /var/run during the docker build, and it actually is a directory.

INFO[0001] RUN ls -l /var                               
INFO[0001] cmd: /bin/sh                                 
INFO[0001] args: [-c ls -l /var]                        
INFO[0001] Running: [/bin/sh -c ls -l /var]             
total 0
drwxr-xr-x    4 root     root            29 Jun 15 00:23 cache
dr-xr-xr-x    2 root     root             6 Jun 15 00:23 empty
drwxr-xr-x    5 root     root            43 Jun 15 00:23 lib
drwxr-xr-x    2 root     root             6 Jun 15 00:23 local
drwxr-xr-x    3 root     root            20 Jun 15 00:23 lock
drwxr-xr-x    2 root     root             6 Jun 15 00:23 log
drwxr-xr-x    2 root     root             6 Jun 15 00:23 mail
drwxr-xr-x    2 root     root             6 Jun 15 00:23 opt
drwxr-xr-x    3 root     root            21 Jun 15 00:23 run <<<< this is a directory, even though it should be a symlink based on the alpine container version.
drwxr-xr-x    3 root     root            30 Jun 15 00:23 spool
drwxrwxrwt    2 root     root             6 Jun 15 00:23 tmp

update 3: I have found a very dumb work around… I just mv /var in my gitlab-ci.yml file like this:

build-containers:
  stage: releases
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - mv /var /var-orig
    - /kaniko/executor build commands

Can the read only secret be mounted in another dir?

This wouldn’t work on Kubernetes that mounts the service account secret automatically under /var/run right? I tried with --no-scripts or --no-commit-hooks but it doesn’t help either.

We’ve just hit this issue with alpine:latest, which is currently the same as alpine 3, 3.16, and 3.16.0:

(2/2) Upgrading alpine-baselayout (3.2.0-r20 -> 3.2.0-r21)
Executing alpine-baselayout-3.2.0-r21.pre-upgrade
rm: can't remove '/var/run/secrets/eks.amazonaws.com/serviceaccount/..data': Read-only file system

Another solution is to not mount the service account token automatically: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server Probably you don’t need the token.

GitLab has a feature request to add this as an option: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4786 And without the mounted token there is no /var/run/secrets/kubernetes.io/serviceaccount directory and therefore no problem.

One particularly quick fix is : apk upgrade --no-cache --ignore alpine-baselayout. Though be warned, apk explicitly says that partial upgrades aren’t supported (but at least you can test).

Another approach would be to map “/” to “/tmp/kanikoRootXXX” at the beginning of the build. (which is probably what you are suggesting in the edit)

Exactly

I don’t think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.

This would probably solve quite a bunch of COPY issue at once

Wouldn’t it be easier for kaniko to extract images and run commands in a separate root context ? How difficult would it be to implement?

This is also causing a problem building a custom image based on nginx alpine image and building it with kaniko.

In the nginx alpine image, nginx.conf uses pid file location /var/run/nginx.pid

If I build a custom image off nginx alpine but want it to run as non-root user, I need to create an empty file /var/run/nginx.pid in the image and set the ownership of this file to the non-root user:

This works fine when building with docker: COPY --chown=nginx:nginx nginx.pid /var/run/nginx.pid

However, it doesn’t work when using kaniko because this mount issue causes any file I put in /var/run to be deleted.

Workaround is to change pid path in nginx.conf.

RUN sed -i 's,/var/run/nginx.pid,/tmp/nginx.pid,' /etc/nginx/nginx.conf
COPY --chown=nginx:nginx nginx.pid /tmp/nginx.pid

same here

If we choose this approach for every command, i,e map “/” to another directory in “tmp”, then i see 2 issues.

  1. When executing subsequent run commands, we need to find all file changes/modified or deleted. if we plan to use this approach, it would mean subsequent run command will be independent of each other which is not case. To ensure, that, we will have to copy over all changes back to “/” or the next command’s chroot. This could introduce delays.
  2. If the Run command uses commands installed in paths relative to “/” how would that work.

Another approach would be to map “/” to “/tmp/kanikoRootXXX” at the beginning of the build. (which is probably what you are suggesting in the edit) I think that could work but we need to do something like this for all the Metadata commands like “ENV”, “WORKDIR”. Also for all the base images, we need to map their ImageConfig.Env paths to be relative to this new chroot.

I don’t think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.

Another hack would be to not fail if apk upgrade fails due to error " ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk…"

Say, you create an upgrade script apk-upgrade.sh

#!/bin/bash

ERR='apk add --update git openssh-client     && apk update     && apk upgrade alpine-baselayout'
EXIT_CODE=$?
// if exit code is 0 then return exit code

PERMISSIBLE_ERR="ERROR: alpine-baselayout-3.2.0-r7: failed to rename"
if [[ "$ERR" == *"$PERMISSIBLE_ERR"* ]]; then
 // Swallow error
  exit 0
fi
// probably some other error
exit 1

+1 upgrading in Gitlab pipeline using kaniko only works with apk upgrade --ignore alpine-baselayout.

I’m also affected by this issue while updating core packages of an alpine image 😞 . Is there any update on this or any known workarounds?

@tejal29 ,

Do we have any timeline to fix this issue for the latest alpine builds, we are kinda blocked to use kaniko to build alpine images.

we can’t ignore alpine-baselayout since it has core package updates. apk upgrade --no-cache --ignore alpine-baselayout

Also asking about the timeline for a fix. Any updates?