docker-alpine: apk fetch hangs

fetch of the apk index just hangs. I hit this now on a Ubuntu server and Docker for Windows

Step 1/14 : FROM maven:3.3.9-jdk-8-alpine
 ---> dd9d4e1cd9db
Step 2/14 : RUN apk update && apk upgrade       && apk add --no-cache --update  ca-certificates         bash    wget    curl    tree    libxml2-utils   putty   git     && rm -rf /var/lib/apt/lists/*     && rm -rf /var/cache/apk/*
 ---> Running in 536cbd484c36
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz

Docker version 17.03.1-ce, build c6d412e

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 59
  • Comments: 39 (2 by maintainers)

Commits related to this issue

Most upvoted comments

I added it to my docker client commands, e.g. ‘docker build --network host …’

I had a similar issue. We have a docker-in-docker build container within a Rancher 2 / Kubernetes environment. I had to decrease the MTU of the inner docker service by adding "mtu": 1200 into /etc/docker/daemon.json. The host servers MTU is 1500.

daemon.json

{
    "mtu": 1200
}

I only have this with dind + kubernetes. However it doesn’t happen if I use ‘–network host’ or ‘–net host’. I am using weave overlay network.

Seems like a DNS issue. Not sure why, I’ve set correct dns settings in %programdata%\docker\config\daemon.json

nslookup dl-cdn.alpinelinux.org
nslookup: can't resolve '(null)': Name does not resolve

Name:      dl-cdn.alpinelinux.org
Address 1: 151.101.48.249

Got around this by running using https

RUN sed -i 's/http\:\/\/dl-cdn.alpinelinux.org/https\:\/\/alpine.global.ssl.fastly.net/g' /etc/apk/repositories

After 4 hours of debugging managed to solve this by changing this in the gitlab-ci file:

services:
  name: docker:dind

TO

services:
  - name: docker:dind
    command: ["--mtu=1300"]

source: https://github.com/docker-library/docker/issues/103#issuecomment-478619847

@evanrich My gitlab CI was using docker:dind as a service container, and my main build container had a docker client in it which I used to connect to the service container. My repo has a dockerfile in it that I need to be built by the gitlab runner. My .gitlab-ci.yaml file contained the command

docker build .

This builds my docker image. One of my layers in the dockerfile runs apk update. This command hangs, causing the docker build command and the CI as a whole to fail. However, if I modify my .gitlab-ci.yaml file to have

docker build --network host .

docker will run the apk update command from my dockerfile without hanging.

observation, networking is hard.

Also facing this thing from time to time. Here’s a typical output of GitLab CI when fetching fails:

screen shot 2017-10-14 at 15 19 48

Manually stopping and retrying the stuck CI job helps, but there’s no guarantee of reliability

If you come here from Drone CI and their drone plugin, set the MTU that fits you in the settings of the plugin. Probably could save you some hours of debugging and desperate attempts:

kind: pipeline
type: kubernetes
name: default

steps:
  - name: dockerize
    image: plugins/docker
    settings:
      ...
      mtu: 1000

I believe that the problem is that in docker the MTU is lower than on the host. The way this is supposed to work is via path MTU discovery, but fastly appears to block the PMTU icmp packet (I guess it is a part of their DDoS defence). The way to “fix” this properly is to enable MSS clamping on the host. https://blog.ipspace.net/2013/01/tcp-mss-clamping-what-is-it-and-why-do.html

The other alternative is to use a different mirror that does not block the PMTU traffic.

It seems like fastly is filtering ICMP need to frag packets, which means that PMTU does not work. This can be a problem is your traffic goes via a network link that has MTU lower than 1500 (typically tunnels/vpns, PPPoE and similar). This can be worked around by enabling tcp mss clamping in the network.

Old problem, but it still happens!

For me, none of the options worked! I will mention some of the steps that alleviated the problem and allowed me to generate the image, even after 2 or 3 attempts, which is already good, since I could not even generate the image!

1# Repository change, for any mirror, add a RUN line or Joining an existing RUN: echo "http://dl-4.alpinelinux.org/alpine/v3.12/main" > /etc/apk/repositories \ && apk update … The Official List is here: https://mirrors.alpinelinux.org/

2 # The one that best behaved was to change the DNS of the Image, add a RUN line or Joining an existing RUN: RUN printf "nameserver 208.67.222.222\nnameserver 8.8.4.4\nnameserver 1.1.1.1\nnameserver 9.9.9.9\nnameserver 8.8.8"> /etc/resolv.conf \ && apk update && apk add ... *** This Line must be included for all RUNs that update.

3 # Change the Docker DNS: In Ubuntum, just edit the file: / etc / default / docker Ex: sudo gedit / etc / default / docker E Include in the file, the Line: DOCKER_OPTS = "- dns 208.67.222.222 --dns 8.8.8.8 --dns 1.1.1.1 --dns 8.8.4.4 --dns 208.67.220.220 --dns 9.9.9.9"

I was able to narrow down the issue and is IPv6. If docker host has IPv6 enabled you are pretty much f**** as apk fetch from inside container will get stuck trying to fetch from dl-cdn.alpinelinux.org which will return “dualstack” results, but we all know that IPv6 does not work in containers.

APK gets fully stuck without ever timing out or trying to to use IPv4 addresses, which will likely work.

That problem is a huge PITA as normal debugging techiniques will not give any usable results:

  • using --network host does not matter
  • using ping or nslookup on dl-cdn.alpinelinux.org from inside container works too
  • Even using wget works (curl is absent from base image)

UPDATE, we have a working hack

I can confirm that https://stackoverflow.com/a/41497555/99834 hack works on both docker and podman, mainly adding --dns-opt='options single-request' --sysctl net.ipv6.conf.all.disable_ipv6=1 when running/building the containers.

If you come here from Drone CI and their drone plugin, set the MTU that fits you in the settings of the plugin. Probably could save you some hours of debugging and desperate attempts:

kind: pipeline
type: kubernetes
name: default

steps:
  - name: dockerize
    image: plugins/docker
    settings:
      ...
      mtu: 1000

Hi, thanks for this. This help my build. I remember I found article about MTU that maybe useful to give more information https://medium.com/@liejuntao001/fix-docker-in-docker-network-issue-in-kubernetes-cc18c229d9e5

@evanrich My gitlab CI was using docker:dind as a service container, and my main build container had a docker client in it which I used to connect to the service container. My repo has a dockerfile in it that I need to be built by the gitlab runner. My .gitlab-ci.yaml file contained the command

docker build .

This builds my docker image. One of my layers in the dockerfile runs apk update. This command hangs, causing the docker build command and the CI as a whole to fail. However, if I modify my .gitlab-ci.yaml file to have

docker build --network host .

docker will run the apk update command from my dockerfile without hanging.

are you not using auto devops? I haven’t specified a .gitlab-ci.yml file yet, I seem to have worked around part of it via switching to alpine.global.ssl.fastly.net, but i get this

Status: Downloaded newer image for golang:alpine
 ---> 95ec94706ff6
Step 2/13 : RUN sed -i 's/http\:\/\/dl-cdn.alpinelinux.org/https\:\/\/alpine.global.ssl.fastly.net/g' /etc/apk/repositories
 ---> Running in a3de349b32f8
Removing intermediate container a3de349b32f8
 ---> 39505fc0c5f2
Step 3/13 : RUN apk update;     apk add git gcc build-base;     go get -v github.com/cloudflare/cloudflared/cmd/cloudflared
 ---> Running in 548789a2500b
fetch https://alpine.global.ssl.fastly.net/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch https://alpine.global.ssl.fastly.net/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
v3.8.1-22-g24d67bab3a [https://alpine.global.ssl.fastly.net/alpine/v3.8/main]
v3.8.1-16-g96e1e57fed [https://alpine.global.ssl.fastly.net/alpine/v3.8/community]
OK: 9539 distinct packages available
(1/25) Installing binutils (2.30-r5)

and it just hangs at installing binutils every time. Found this: https://github.com/gliderlabs/docker-alpine/issues/279 . seems to be a wide spread issue in k8s due to lower mtu.

I was able to get slightly further with changing my mirror from a fastly mirror to mirror.clarkson.edu using RUN sed -i 's/http\:\/\/dl-cdn.alpinelinux.org/http\:\/\/mirror.clarkson.edu/g' /etc/apk/repositories

builds are running, will update when they finish.

Edit: Just finished successfully… build 174 (that’s how many times it’s taken trying to get this to work"

Removing intermediate container 5c42267a84e9
 ---> 339cedacd0cf
Step 12/13 : EXPOSE 54/udp
 ---> Running in 8308f4f1cb00
Removing intermediate container 8308f4f1cb00
 ---> b917125f9e41
Step 13/13 : EXPOSE 34411/tcp
 ---> Running in 5d3115c32a0f
Removing intermediate container 5d3115c32a0f
 ---> 33616623b643
Successfully built 33616623b643
Successfully tagged registry.evanrichardsonphotography.com/docker/cloudflared/master:a66a757bee6a6de2276ed4a8d3a8de121efc8705
Pushing to GitLab Container Registry...
The push refers to repository [registry.evanrichardsonphotography.com/docker/cloudflared/master]
75ddfc9ca656: Preparing
ff665015151e: Preparing
434f9e907dc9: Preparing
e834c1681702: Preparing
676adc5a23cc: Preparing
e834c1681702: Layer already exists
676adc5a23cc: Layer already exists
434f9e907dc9: Pushed
ff665015151e: Pushed
75ddfc9ca656: Pushed
a66a757bee6a6de2276ed4a8d3a8de121efc8705: digest: sha256:75efdf757e24da3a27a3674f49508e9f85d0d115e921231ae52835f56a28e1b7 size: 1368

Job succeeded

I’ve see the k8s issue quite a bit. wireshark shows fastly getting stuck sending oversized packets with a do not fragment flag. I don’t think this is OP’s issue though as its docker for windows.

I recently started running into a similar issue to this as well though. On linux but the behavior is the same, apk fails fetching and mostly on the index. Again, pulled up wireshark and recreated the problem. I see things going smoothly then the apk process seems to stop ACK’ing segments from the fastly server. Fastly starts throttling and resending segments and it lags out.

I’ve never recreated this with curl, but it looks like apk uses a built in BSD libfetch for its HTTP communications so maybe there’s a bug in there?

My network communication understanding is just enough to get me this far so here’s a link to the wireshark log of the communications. hopefully an alpine dev has a better understanding and can parse out a clue or find the problem.

We just got hit by this, running Drone docker plugin in a Kubernetes cluster. Decreasing the MTU to the value used by the eth0 interface on the docker plugin container fixed the issue, thank you so much for sharing this fix.

What I absolutely do not understand is how it worked for almost a year without this workaround. We didn’t change anything about our cluster or Drone setup, or the Alpine versions used in our pipelines. If someone has discovered more information about this, please do share.

@andremarianiello so you mean you set --network host for dockerized docker daemon? Where did you set it?

Doesn’t seem like a DNS issue, since it has resolved. Unfortunately, for while, I’m at the same point: names are resolved, but can’t connect to anything.

Running the offical Drone helm chart on k3os (v0.11) I had to set the MTU to 1450 for my build to finish and not stall on fetching the apkindex.

- name: docker-build
   image: plugins/docker
   settings:
     mtu: 1450

@ncopa How can we check to see if our docker mtu is lower than our host mtu?

Yeah I was treating this as a different issue because it has slightly different characteristics and not the same as #279.

The Wireshark link in https://github.com/gliderlabs/docker-alpine/issues/307#issuecomment-387613710 shows a different traffic behavior. Instead of the traffic doesn’t get killed at the bridge, it is never ACK’d by libfetch and Fastly’s TCP session gets stuck trying to get recover. I don’t know if it’s even fastly’s fault as on the surface it seems to be doing the right thing.

... misc traffic 
ack:                   container -> bridge -> fastly
Transmission:          container <- bridge <- fastly
Transmission:          container <- bridge <- fastly
ack:                   container -> bridge -> fastly
Transmission:          container <- bridge <- fastly
Transmission:          container <- bridge <- fastly
ack:                   container -> bridge -> fastly
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 2
Transmission:          container <- bridge <- fastly 3
Transmission:          container <- bridge <- fastly 4
Transmission:          container <- bridge <- fastly 5
... some number of other packets
Transmission:          container <- bridge <- fastly X
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
Transmission:          container <- bridge <- fastly 1
.... repeat