sysbox: Suddenly getting very slow builds on inner container with Sysbox v0.4.1
Our company runs our GitHub actions CI pipelines on a docker-in-docker setup using sysbox.
The setup is
- Ubuntu 20.04.3 LTS
- Docker 20.10.10
- sysbox 0.4.1 (latest) The GitHub actions run inside a container running on the sysbox runtime
All of a sudden on Monday - our docker builds which usually take ~5 minutes started not completing even after ~2 hours
If I exec into the action-runner container and run them manually, I can see that especially the transferring context step is taking an eternity. Here’s a portion of the output:
#14 transferring context: 40.21MB 186.6s
#14 transferring context: 41.55MB 191.7s
#14 transferring context: 43.38MB 196.8s
#14 transferring context: 45.16MB 201.8s
#14 transferring context: 46.61MB 206.9s
#14 transferring context: 48.03MB 212.0s
#14 transferring context: 49.46MB 217.1s
#14 transferring context: 51.19MB 222.2s
#14 transferring context: 52.90MB 227.3s
#14 transferring context: 54.62MB 232.4s
#14 transferring context: 56.08MB 237.4s
#14 transferring context: 57.44MB 242.4s
#14 transferring context: 58.66MB 247.5s
#14 transferring context: 59.10MB 252.6s
#14 transferring context: 59.58MB 257.7s
#14 transferring context: 60.12MB 262.7s
#14 transferring context: 60.68MB 267.8s
#14 transferring context: 61.26MB 272.9s
The total build context is ~350M
So far I have:
- Updated all system packages
- Purged (via apt purge & removing /var/lib/docker) & Reinstalled both docker & sysbox
- Checked docker builds on the host are unaffected (Even tried mapping the runner directory to the host, and running the app build on the host - takes ~5mins as expected)
- Downgraded to the previous kernel
- Just completely destroyed the original box & reinstalled everything on a new one However the problem persists
The server does have unattended upgrades enabled, and in the period of last week (when things were fine), to this week (when things have gone wrong), I can see the following packages have upgraded on the host:
xxd (2:8.1.2269-1ubuntu5.4) over (2:8.1.2269-1ubuntu5.3)
openssl 1.1.1f-1ubuntu2.9
libssl1.1:amd64 (1.1.1f-1ubuntu2.9) over (1.1.1f-1ubuntu2.8)
libssl1.1:amd64 (1.1.1f-1ubuntu2.9) over (1.1.1f-1ubuntu2.8)
libtdb1:amd64 (1.4.3-0ubuntu0.20.04.1) over (1.4.2-3build1)
ufw (0.36-6ubuntu1) over (0.36-6)
python3-software-properties (0.99.9.8) over (0.98.9.5)
The docker-compose file, used to start the runner, is
version: '3.9'
services:
worker:
image: our-org/gh-action-runner:latest
env_file: .env
runtime: sysbox-runc
security_opt:
- no-new-privileges:true
restart: always
tmpfs:
- /tmp:size=512M
Any help debugging/fixing this greatly appreciated 😃
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 29 (19 by maintainers)
Hi @mike-chen-samsung, apologies for the belated response. I’ve not yet had a chance to look into this, will allocate some cycles this coming week to get to the bottom of it. Thanks for your patience.
Hi @mike-chen-samsung, I was able to repro in a GKE pod, using a pod spec similar to the one you posted above. I suspect it’s something in the
docker:dindimage entrypoint. I’ll debug it and get back to you.Just for sanity check, I also ran a pod with this spec and it worked fine (i.e., docker used the overlay2 driver inside the pod).
Hi @mike-chen-samsung, yes we need to add a ConfigMap or similar to allow users to easily configure Sysbox flags via the sysbox-deploy-k8s daemonset. We’ve not yet had cycles to do this unfortunately.
Having said that, I think Sysbox should switch the default of
--allow-trusted-xattrfrom true -> false, given that the base requirements (kernel >= 5.11 and Docker >= 20.10.9) are fairly common now. I’ll discuss it and possibly add in the upcoming v0.6.2 release.