moby: manifest list (multiarch) picks wrong arch on ARMv7

Description

With the “new” manifest list for multi-arch support, there is a corner case on ARM where the picked-up architecture is not ideal. On a ARMv7 CPU running Docker (e.g. a Raspberry Pi 2/3), the architecture is armhf (so with floating point hardware support). However when pulling an image such as the official Debian, the “armel” variant is pulled. This is compatible with ARMv7 but emulates floating point operations instead of using the underlying hardware.

Steps to reproduce the issue:

  1. on a ARMv7 SoC (e.g Raspberry Pi 2 or 3), install Linux + Docker CE
  2. do docker run --rm -it debian dpkg --print-architecture

Describe the results you received:

It displays armel. Meaning it has pulled the suboptimal image.

Describe the results you expected:

It should display armhf. Meaning it has pulled the adequate image.

Additional information you deem important (e.g. issue happens only occasionally):

See @tianon tweet (and the rest of the thread for more info): https://twitter.com/tianon/status/909084978515927040

Output of docker version:

Client:
 Version:      17.06.2-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 20:08:20 2017
 OS/Arch:      linux/arm

Server:
 Version:      17.06.2-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 20:01:53 2017
 OS/Arch:      linux/arm
 Experimental: false

Output of docker info:

Containers: 7
 Running: 5
 Paused: 0
 Stopped: 2
Images: 157
Server Version: 17.06.2-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.12.10-v7-lowlat-rtc1307+
Operating System: Raspbian GNU/Linux 9 (stretch)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 968.7MiB
Name: pi2-01.lan.berthon.eu
ID: (redacted)
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

n/a

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 13
  • Comments: 21 (16 by maintainers)

Commits related to this issue

Most upvoted comments

wow, I recal annoying Phil and tianon about this @dockerCon in 2017, and today, it hits me on a hobby project.

memories

It’s definitely not anywhere near ready for use, but here’s what I’ve got so far in my own hacking (which may or may not be useful for elsewhere, and needs to live elsewhere since so many other places need to use this same code/logic):

diff --git a/distribution/pull_v2.go b/distribution/pull_v2.go
index 39bf78249..72555fbdc 100644
--- a/distribution/pull_v2.go
+++ b/distribution/pull_v2.go
@@ -714,19 +714,60 @@ func (p *v2Puller) pullManifestList(ctx context.Context, ref reference.Named, mf
 	if system.LCOWSupported() {
 		lookingForOS = "linux"
 	}
-	for _, manifestDescriptor := range mfstList.Manifests {
-		// TODO(aaronl): The manifest list spec supports optional
-		// "features" and "variant" fields. These are not yet used.
-		// Once they are, their values should be interpreted here.
-		if manifestDescriptor.Platform.Architecture == runtime.GOARCH && manifestDescriptor.Platform.OS == lookingForOS {
-			manifestDigest = manifestDescriptor.Digest
-			logrus.Debugf("found match for %s/%s with media type %s, digest %s", runtime.GOOS, runtime.GOARCH, manifestDescriptor.MediaType, manifestDigest.String())
-			break
+	platformPreferences := []manifestlist.PlatformSpec{}
+	switch runtime.GOARCH {
+	case "amd64":
+		// TODO handle windows "OSVersion" as well (similar to "arm" Variants)
+		platformPreferences = []manifestlist.PlatformSpec{
+			{
+				OS:           lookingForOS,
+				Architecture: runtime.GOARCH,
+			},
+			// "amd64" can fall back to running "386" images if necessary
+			{
+				OS:           lookingForOS,
+				Architecture: "386",
+			},
+		}
+	// TODO case "arm64": determine whether the current CPU can run in 32bit mode and add "arm" as an additional fallback
+	case "arm":
+		/*
+		for v := runtime.GOARM; v >= 5; v-- {
+			platformPreferences = append(platformPreferences, manifestlist.PlatformSpec{
+				OS:           lookingForOS,
+				Architecture: runtime.GOARCH,
+				Variant:      fmt.Sprintf("v%d", v),
+			})
+		}
+		*/
+		platformPreferences = append(platformPreferences, manifestlist.PlatformSpec{
+			OS:           lookingForOS,
+			Architecture: runtime.GOARCH,
+			// if all else fails, fall back to no-Variant "arm"
+		})
+	default: // "386", "ppc64le", "s390x", etc
+		platformPreferences = []manifestlist.PlatformSpec{{
+			OS:           lookingForOS,
+			Architecture: runtime.GOARCH,
+		}}
+	}
+PlatformChoice:
+	for _, platformPreference := range platformPreferences {
+		for _, manifestDescriptor := range mfstList.Manifests {
+			if platformPreference.Variant != "" && manifestDescriptor.Platform.Variant != platformPreference.Variant {
+				continue
+			}
+			// TODO handle OSVersion similar to Variant
+			if manifestDescriptor.Platform.OS == platformPreference.OS && manifestDescriptor.Platform.Architecture == platformPreference.Architecture {
+				manifestDigest = manifestDescriptor.Digest
+				logrus.Debugf("found match for %s/%s with media type %s, digest %s", platformPreference.OS, platformPreference.Architecture, manifestDescriptor.MediaType, manifestDigest.String())
+				break PlatformChoice
+			}
 		}
 	}
 
 	if manifestDigest == "" {
-		errMsg := fmt.Sprintf("no matching manifest for %s/%s in the manifest list entries", runtime.GOOS, runtime.GOARCH)
+		errMsg := fmt.Sprintf("no matching manifest for %s/%s in the manifest list entries", platformPreferences[0].OS, platformPreferences[0].Architecture)
 		logrus.Debugf(errMsg)
 		return "", "", errors.New(errMsg)
 	}

I’ve successfully verified that I can pull i386/hello-world:latest on amd64 with this approach (which currently fails due to it being a 386-only manifest list), although I get Error response from daemon: oci runtime error: target os mismatch with current os linux. when I try to run it. 😅

Also, I’m really disappointed to note that runtime.GOARM doesn’t exist – runtime.goarm does, but would require some hanky cgo in order to access due to it being private, so we’re likely going to be better off doing runtime detection in the case of GOARCH of arm or arm64.

I think https://github.com/golang/go/blob/718d9de60fd4337d9044cdc2c685177dd2177ef6/src/runtime/os_linux_arm.go is probably a useful reference, if we can successfully recreate the value of hwcap (which appears to come from /proc/self/auxv) to test against it. Basically, if hwcap&_HWCAP_VFP == 0, we have no floating point unit and should prefer v5 only, else if hwcap&_HWCAP_VFPv3 == 0, we prefer v6 followed by v5, and otherwise, we prefer v7 then v6 then v5.

@yosifkit - I did a few tests with a raspberry pi 1 (not a zero) with docker 19.03.8 about your ‘hack’, and I think it “looks to work” only if you amend both v6 and v7 variants with the additional ‘l’ (as you did on https://github.com/KEINOS/Dockerfile_of_Alpine/issues/3). But I have unfortunately the feeling that it only “looks to work”, but actually, it does not. You could get an idea of the tests I did by analyzing the public manifests “biarms/mysql:test-moby-issue-34875-tc1” to “biarms/mysql:test-moby-issue-34875-tc4” (with command line DOCKER_CLI_EXPERIMENTAL=enabled docker manifest inspect biarms/mysql:test-moby-issue-34875-tc1) and test them with commands like docker run --rm mplatform/mquery biarms/mysql:test-moby-issue-34875-tc1 and docker run -it --rm biarms/mysql:test-moby-issue-34875-tc3 --version to get an idea. FYI, to perform my tests, I also set the log level to trace for my docker service with:

mkdir -p /etc/systemd/system/docker.service.d/
sudo bash -c "cat > /etc/systemd/system/docker.service.d/debug.conf <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --log-level trace -H fd:// --containerd=/run/containerd/containerd.sock
EOF
"
sudo systemctl daemon-reload
sudo systemctl restart docker.service

And here are my conclusions:

  1. When the manifest is set ‘as it should be’ (biarms/mysql:test-moby-issue-34875-tc1), it doesn’t work. In my case, I think that the main reason is that my rpi1 (exact model: “Raspberry Pi Model B Plus Rev 1.2”) considers itself as an arm v7 device, while it is not ! I have created https://github.com/moby/moby/issues/41017 for this issue. I can’t be sure it is the same pb for you, because I don’t have a rpi0 to perform my test with this device.
  2. When the manifest is ‘hacked’ with the additional ‘l’ as you suggest only for the v6 variant, (biarms/mysql:test-moby-issue-34875-tc2), it doesn’t work on my rpi1 because my rpi1 (which always consider itself stupidly as an arm v7 device) is happy to find an image with “an exact v7 match”, and get the v7 image, which will fail
  3. When both v6 and v7 manifests are ‘hacked’ with the additional ‘l’ (biarms/mysql:test-moby-issue-34875-tc3), my rpi1 don’t find an exact arm match, and search for the ‘best arm image’. And actually, I suppose it takes the first in the list, which is v6. So it ‘works’ on the rpi1, but it does not ‘work as expected’.
  4. I did an additional test by changing both v6 and v7 variants to became respectively ‘armv6a’ and ‘armv7a’ (that’s biarms/mysql:test-moby-issue-34875-tc4), and I got the same result, which seams to confirm my previous guess.

The “pb” with your hack is that a real arm v7 device (in my case, an odroid device also running docker 19.03.8) will also get the v6 image. So yes, it will work, as arm-v7 is backward compatible with arm-v6. But it will not work as expected. If the goal is only to have something that ‘run’, then only publish the arm v6 and never the arm v7, and it will also solve the pb (without v6l hack).

My tests on a rpi1:

$ docker pull biarms/mysql:test-moby-issue-34875-tc1
# produce this log
#   May 17 11:16:27 white dockerd[15672]: time="2020-05-17T11:16:27.587327515+02:00" level=debug msg="Pulling ref from V2 registry: biarms/mysql:test-moby-issue-34875-tc1"
#   May 17 11:16:27 white dockerd[15672]: time="2020-05-17T11:16:27.597654464+02:00" level=debug msg="docker.io/biarms/mysql:test-moby-issue-34875-tc1 resolved to a manifestList object with 4 entries; looking for a unknown/arm match"
#   May 17 11:16:27 white dockerd[15672]: time="2020-05-17T11:16:27.614210382+02:00" level=debug msg="found match for linux/arm/v7 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:6ea997c38528728924f497faa8dc35709e305abb822742aed77ddf53a25870a8"
# which contains: "found match for linux/arm/v7"

$ docker pull biarms/mysql:test-moby-issue-34875-tc2
# will produce identical logs

# But
$ docker pull biarms/mysql:test-moby-issue-34875-tc3
# as well as
$ docker pull biarms/mysql:test-moby-issue-34875-tc4
# will produce a log similar to  
#   May 17 11:24:25 white dockerd[15672]: time="2020-05-17T11:24:25.635767756+02:00" level=debug msg="docker.io/biarms/mysql:test-moby-issue-34875-tc3 resolved to a manifestList object with 4 entries; looking for a unknown/arm match"
#   May 17 11:24:25 white dockerd[15672]: time="2020-05-17T11:24:25.662697624+02:00" level=debug msg="found deprecated partial match for linux/arm/v7 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:db501b5f0a30fcd848f3b174c2f149f156f867fef7fd7aa1ea606eb88326afb3"
#   May 17 11:24:25 white dockerd[15672]: time="2020-05-17T11:24:25.667139602+02:00" level=debug msg="found deprecated partial match for linux/arm/v7 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:6ea997c38528728924f497faa8dc35709e305abb822742aed77ddf53a25870a8"
#   May 17 11:24:25 white dockerd[15672]: time="2020-05-17T11:24:25.671611580+02:00" level=debug msg="found multiple matches in manifest list, choosing best match sha256:db501b5f0a30fcd848f3b174c2f149f156f867fef7fd7aa1ea606eb88326afb3"
#

# On my rpi1:
$ uname -m && docker images | grep issue-34875
armv6l
biarms/mysql          test-moby-issue-34875-tc1                             dc18f9b45d8a        6 days ago          235MB # That's armv7 image: NOK
biarms/mysql          test-moby-issue-34875-tc2                             dc18f9b45d8a        6 days ago          235MB # That's armv7 image: NOK
biarms/mysql          test-moby-issue-34875-tc3                             44603b31e106        6 days ago          217MB # That's armv6 image: OK
biarms/mysql          test-moby-issue-34875-tc4                             44603b31e106        6 days ago          217MB # That's armv6 image: OK
biarms/mysql          test-moby-issue-34875-tc5                             44603b31e106        6 days ago          217MB # That's armv6 image: OK


# On my Odroid (but I suspect to get the same result on a rpi3 with a 32 bits OS)
$ uname -m && docker images | grep issue-34875
odroid@odroid:/etc/apt$ uname -m && docker images | grep issue-34875
armv7l
biarms/mysql          test-moby-issue-34875-tc1                             dc18f9b45d8a        6 days ago          235MB # That's armv7 image: OK
biarms/mysql          test-moby-issue-34875-tc2                             dc18f9b45d8a        6 days ago          235MB # That's armv7 image: OK
biarms/mysql          test-moby-issue-34875-tc3                             44603b31e106        6 days ago          217MB # That's armv6 image: works, but not as expected
biarms/mysql          test-moby-issue-34875-tc4                             44603b31e106        6 days ago          217MB # That's armv6 image: works, but not as expected
biarms/mysql          test-moby-issue-34875-tc5                             44603b31e106        6 days ago          217MB # That's armv6 image: works, but not as expected

By the way, aarch64, there is no pb :

$ uname -m && docker images | grep issue-34875
aarch64
biarms/mysql                                          test-moby-issue-34875-tc1                             6069f6980ac5        6 days ago          273MB
biarms/mysql                                          test-moby-issue-34875-tc2                             6069f6980ac5        6 days ago          273MB
biarms/mysql                                          test-moby-issue-34875-tc3                             6069f6980ac5        6 days ago          273MB
biarms/mysql                                          test-moby-issue-34875-tc4                             6069f6980ac5        6 days ago          273MB
biarms/mysql                                          test-moby-issue-34875-tc5                             6069f6980ac5        6 days ago          273MB

I realize that ARM hardware “variant inspection” is the hard problem being danced around (and @stevvooe and I just had a talk with the ARM, Ltd. team lead at the Moby Summit last week on this topic), but that is the correct solution, as it corrects the TODO around our code completely ignoring variant at this point. This TODO has been sitting in the code since Docker 1.10 or so now: https://github.com/moby/moby/blob/master/distribution/pull_v2.go#L718-L726

Ordering is, as noted above, only a workaround and can only be “depended” on for certain tools that we have control over (like the Docker engine or containerd). Given the spec speaks nothing of ordering, then it is not a good solution for verifying that others will follow some general ordering for “best case” scenarios.

@tianon might be able to fix this by changing the order in which images are listed in the manifest. For debian image, we have the following:

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:2335c729b8a6764c52a3cbfe43d1450d5e782638c986d237ffc30ca33881c3e3",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:956077792b12d730494cd54eacb497b86ef732434dda6cb67d300c283c19322b",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v5"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:acc143d95320fe0b572ca8677e18aea3e9306b3b9f94a26fc1e084ec4361f104",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:10656f9d3452a3825f879d52b7ab6f997eddd071bb08c79b353189655cbb8dbd",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:4599b85efe839220e3c00b5d380910fbe968ffe933a9155a6f013fb416ffa1f1",
         "platform": {
            "architecture": "386",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:34b94575d7b39cbbdc2facecd0e8fe87b203179fe0221811fb13a1d911311756",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:b01d35a1891549568b1f5fb66b329dded1e9cd45d6cb74f0c02aeb4c72a1417f",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

Note that v5 is listed before v7. Based on the current matching, it will favor the v5 image because it doesn’t consider the variant.

@yosifkit

v6l is not part of the OCI image specification, so it should not be used.

Indeed, we should not. I agree.

Since it should work with both v6 and v7 and it doesn’t, for almost 3 years. I understand that it’s more to the ARM problem rather than Docker’s multi-arch detection functionality.

I just needed a stable(?) workaround to make a single Dockerfile for both architectures until this issue gets solved. But I should’ve mentioned that. Thanks, I will add a note to my post.

A variant value of v6l is not part of the OCI image specification, so it should not be used. v6 is the proper value for all Armv6 devices.

F.Y.I.

Hi, here’s a little hack/tip for those who want to create a single but multi-arch manifest list that is compatible with both ARMv6 and ARMv7 (such as RaspberryPi ZeroW and RPi3B+).

  • TL; DR

    • Re-write the variant value of the manifest from v6 to v6l and v7 to v7l for each architecture in the manifest list,
  • TS; DR

$ # Your base image name for multi-arch manifest list
$ name_base=keinos/alpine

$ # Name of the manifest list, the image name with "latest" tag
$ name_manifest_list="${name_base}:latest"

$ # List of images to include in "latest" manifest list
$ name_manifest_v6="${name_base}:armv6"
$ name_manifest_v7="${name_base}:armv7"

$ # Pull your built images for each architecture w/it's manifest
$ docker pull $name_manifest_v6
$ docker pull $name_manifest_v7

$ # Make a manifest list for "latest" tag and pray
$ docker manifest create $name_manifest_list \
     $name_manifest_v6 \
     $name_manifest_v7 \
     --amend

$ # Re-write variant value to "v6l"
$ docker manifest annotate $name_manifest_list $name_manifest_v6 --variant v6l

$ # Re-write variant value to "v7l"
$ docker manifest annotate $name_manifest_list $name_manifest_v7 --variant v7l

$ # Push
$ docker push $name_manifest_list --purge

This image should work on: RPi Zero(ARMv6), RPi 3B+(ARMv7), macOS(Intel), Win10(Intel), and Linux(Intel) machines.

docker pull keinos/alpine:latest
  • NOTE (2020/03/25): As @yosifkit commented below, v6l is not part of the OCI image specification. Since it aims to work with the v6 variant and to avoid confusion in the future, one should use v6 rather than v6l. So keep in mind that this is a temporary workaround until this issue gets solved.

From reading through the issue here I am unsure whether something is done about it.

On my Raspberry Pi 3+ my Docker reports:

$ docker version
Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:21:09 2018
  OS/Arch:      linux/arm
  Experimental: false

For example I can run Alpine directly:

$ docker run -it --rm alpine:3.8
Unable to find image 'alpine:3.8' locally
3.8: Pulling from library/alpine
Digest: sha256:7043076348bf5040220df6ad703798fd8593a0918d06d3ce30c6c93be117e430
Status: Downloaded newer image for alpine:3.8
/ # 

However, Python within the same Alpine release:

$ docker run -it --rm python:3.7-alpine3.8
Unable to find image 'python:3.7-alpine3.8' locally
3.7-alpine3.8: Pulling from library/python
docker: no matching manifest for linux/arm in the manifest list entries.
See 'docker run --help'.

My guess is that the manifests of these library images are not equal. Is there anyway I can force this locally?

Also, I’m really disappointed to note that runtime.GOARM doesn’t exist – runtime.goarm does, but would require some hanky cgo in order to access due to it being private

Looks like there’s others running into that; https://gist.github.com/lucab/f7162ca2d95191c692edc12ea8ccaaef, and found some discussion in https://github.com/golang/go/issues/9737 (and https://github.com/golang/go/commit/1b53f15ebb00dd158af674df410c7941abb2b933)

Should we open an issue with Golang to make it public?

@ijc The current convention and behavior of docker is to take the first match. I agree this needs to be more clever but ordering can be used as a fall back when all else fails.

As of now, I have no clue how to detect variant on a given host. I am trying to get some ARM builders in containerd so we can get this right over there. Ideas are welcome.