pipeline: Pipeline nightly build is broken

Expected Behavior

Pipeline nightly build works

Actual Behavior

Pipeline nightly build is broken. Building the base image fails with:

fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
v3.12.0-30-g01407813ee [http://dl-cdn.alpinelinux.org/alpine/v3.12/main]
v3.12.0-29-gb310a5f576 [http://dl-cdn.alpinelinux.org/alpine/v3.12/community]
OK: 12726 distinct packages available
(1/1) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..data': Read-only file system
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/token': Read-only file system
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/namespace': Read-only file system
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt': Read-only file system
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..2020_06_03_02_13_55.407209058/namespace': Read-only file system
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..2020_06_03_02_13_55.407209058/ca.crt': Read-only file system
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..2020_06_03_02_13_55.407209058/token': Read-only file system
Executing alpine-baselayout-3.2.0-r7.post-upgrade
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
Executing busybox-1.31.1-r16.trigger
1 error; 27 MiB in 25 packages
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1

Steps to Reproduce the Problem

  1. https://dashboard.dogfooding.tekton.dev/#/namespaces/default/pipelineruns/pipeline-release-nightly-tqgdd

Additional Info

The image is base on alpine. It used to be latest, and now it’s pinned on 3.12 which is the version that was used in the last working run: https://dashboard.dogfooding.tekton.dev/#/namespaces/default/pipelineruns/pipeline-release-nightly-w5xcr

The only visible difference in the run log is the following. In the successful run:

fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
v3.12.0-3-gc43b21255b [http://dl-cdn.alpinelinux.org/alpine/v3.12/main]
v3.12.0-1-g9465f17ea9 [http://dl-cdn.alpinelinux.org/alpine/v3.12/community]

while in the failing run:

fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
v3.12.0-30-g01407813ee [http://dl-cdn.alpinelinux.org/alpine/v3.12/main]
v3.12.0-29-gb310a5f576 [http://dl-cdn.alpinelinux.org/alpine/v3.12/community]

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (3 by maintainers)

Commits related to this issue

Most upvoted comments

https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade

# migrate /var/run directory to /run
if [ -d /var/run ]; then
	cp -a /var/run/* /run 2>/dev/null
	rm -rf /var/run
	ln -s ../run /var/run
fi

wut

ahh makes sense @joshsleeper ! thanks for explaining 😄 do you happen to know how one could track this kind of thing (e.g. are there release notes somewhere that mention this?) np if not, thanks anyway for the info

image

I think this is a bizarre collision of kaniko behaviour and alpine relying on /var/run being a symlink to /run so I opened https://github.com/GoogleContainerTools/kaniko/issues/1297

I think our options are:

  1. keep the alpine image pinned (and hope this never starts being a problem for 3.11 - i still dont understand why a script committed in 2017 is only causing this problem now)
  2. fix the problem in kaniko
  3. build with something other than kaniko

I think it’s just a perfect storm of conditions that could’ve happened in any prior alpine release, but by chance didn’t.

the base images for alpine 3.12 don’t have the latest alpine-baselayout for their release yet, and so anything that’s trying to build + upgrade from them with a read-only mount anywhere in /var/run/* (and I wager anywhere in /run/* too!) will throw its hands up.

as soon as the alpine base images include that package upgrade, this issue will mostly disappear until the next perfect storm. 😆

Just realized that to make 0.13 ill need to fix this - and I’m build cop tomorrow anyway, so no time like the present 😄

Read-only file system make me think it’s either a node or an image problem 🤔