balena-cli: balena build / deploy --build selects base image of wrong architecture (multiarch image manifest)

balena build and balena deploy --build may fail with “exec format error” or related mismatched architecture errors when the base image in a Dockerfile FROM line is a multiarch base image, where a single name:tag reference includes multiple images of different architectures. Example: FROM ubuntu where ubuntu is an image name that refers to multiple images of different architectures (ARM, ARM 64, 386, x86-64, PowerPC 64 LE, IBM Z, etc).

balena build and balena deploy --build have never had support for multiarch base images. Balena’s original solution (that predates Docker’s multiarch solution) is Dockerfile templates which work well with balenalib base images. While balenalib has alternatives to FROM ubuntu, e.g. FROM balenalib/raspberrypi3-ubuntu, there are no alternatives to other base images like FROM nginx or FROM telegraph, so the CLI needs to implement support for such multiarch base images.

Workaround A workaround is to append the sha256 hash to image name on the FROM line, as detailed in https://github.com/balena-io/balena-cli/issues/1508#issuecomment-801223357.

Edited from Matthew’s original report:

https://forums.balena.io/t/balena-cloud-cli-builder-is-still-not-able-to-interpret-image-manifests/43203/3 I’ve already typed a lot here, and the references are included.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23 (10 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the clarifications.

To avoid leaving a question without an explanation, I am sure there are plenty of reasons for why someone would want the balenalib images to be multi arch, but mine include:

  1. Navigating them on Docker Hub is a nightmare. Hunting down the dates for fixed images is a real pain point. Partly due to Docker Hub not having a useful interface for searching, but also because of the sheer number of images.
  2. There are a whole bunch of images missing for different architectures, I frequently run into issues: https://github.com/balena-io-library/base-images/issues/696 & https://forums.balena.io/t/missing-images-from-balenalib/319708/30
  3. In my development environments, I would like the developers to be working within the Balena images to ensure consistency between the dev and production environments. But that means when building a development environment specifying only one architecture in the Dockerfiles because of the lack of multiarch images. Which then means that all the developers have to use the same system or manually change the Dockerfile to suit their system (which then gets accidentally committed to GitHub). This was always an issue, but not as bad because the vast majority of people used MacBooks so I could specify that image. With the introduction of the new MacBook M1’s on a different architecture, now people are spread out much more between to different architectures and the issue keeps growing. For now I have specified official images instead of Balena’s, which are multi arch and is good enough but not ideal.
  4. The obvious, cleaner Dockerfiles and smoother workflows.

Best case scenario, would be the shift towards using multi-arch images instead of having to specify the variables like %%BALENA_ARCH%%, which it seems is now doable, but obviously a big shift in workflow for many people so understandably a more gradual process.

Short term, it would be nice if the balenalib could start including a multi-arch image, alongside all the current ones, While not resolving some of the above issues, it seems like it would be an additional build process that could be added in without becoming a breaking change. It could allow those willing to gradually start moving over to that system, and would resolve my missing images issue I keep coming across, and dev env issue.

Some food for thought to throw into your product discussion hat.

Thanks again.

@toochevere FYI, I’ve added this to our tracking system: (restricted access) https://jel.ly.fish/pattern-user-balenalib-images-multiarch-ccc7d37

At long last, this issue was resolved in CLI v12.49.0. tada

What exactly has been implemented, and how far does it go? It now has full multi arch image support so we no longer need to specify the sha? Does this apply for cloud builds or just local pushes? Does this mean there will be a gradual transition away from using the %%BALENA_ARCH%% string and we can just specify the image name?

@maggie0002 See answers below:

What exactly has been implemented, and how far does it go?

These changes allow support for manifest lists when doing a CLI build where there is more than one architecture. What does this mean?

  • If you are targeting a base image that has only one possible architecture (application/vnd.docker.distribution.manifest.v1+prettyjws or application/vnd.docker.distribution.manifest.v2+json media type in the manifest) then things are basically the same. The architecture of the image is what it is.
  • If you are targeting a base image that has multiple possible architectures (application/vnd.docker.distribution.manifest.list.v2+json media type in the manifest), for example busybox, then the matching architecture is selected.
    Be careful about the case of mixing manifest lists with single arch images. There are some edge cases where this could create a build that Docker does not know how to resolve correctly, and which end up in an exec error. At the moment, this case will produce a warning. In the future, we hope to address this case as well.

It now has full multi arch image support so we no longer need to specify the sha?

That’s the idea.

Does this mean there will be a gradual transition away from using the %%BALENA_ARCH%% string and we can just specify the image name?

  • The %%BALENA_ARCH%% variable is still supported. There are no plans to deprecate it.
  • At the moment Balena is not using manifest lists. That may be something added in the future. I will bring up this possibility in a Product discussion, but it would likely not be on the near-term roadmap.

NOTE: These changes affect the CLI, they are not yet implemented in the cloud builder. But that should follow within the next few weeks since the bulk of the important changes are in code that is shared between them.

At long last, this issue was resolved in CLI v12.49.0. 🎉

I didn’t give very good descriptions there, apologies.

I choose not to use the name of the board for an image, such as raspberrypi3-Debian as it includes a bunch of device specific drivers that aren’t of use to me. I prefer smaller image sizes so opt for the arch specific images. So an example from my Dockerfile is:

balenalib/%%BALENA_ARCH%%-alpine-python:3.8.10-3.13-20210603

Which would resolve on a raspberry pi 4 to something like:

balenalib/aarch64-alpine-python:3.8.10-3.13-20210603

I then use that same image for all devices that are aarch64, whether they be raspberry pi or orange pi etc etc all attached to the same fleet.

With support for multi arch images in the cloud and now CLI, an image with a name like balenalib/alpine-python:3.8.10-3.13-20210603 could now resolve to aarch64 by itself. Or on my developers computers, to amd64, or whatever is needed. All in one tidy package.

Indeed it was thought it could be in parallel to the current images to avoid it being a breaking change, which would initially ballon the amount of images but I imagine many people using the BALENA_ARCH like me would gradually switch and an option in the future may present itself to reduce the number of images.

Assuming my understanding of the images is correct, it’s a little surprising that images specified by name of board rather than by arch is the default considering the cost to image size and Balena’s goals of being as lean as possible.

Be careful about the case of mixing manifest lists with single arch images.

To clarify, this is the case of a multi-stage Dockerfile with multiple FROM lines, where some FROM lines use multiarch images and other FROM lines use single-arch images.

There is no problem with the case of multiple services (in a docker-compose.yml file) where some services use single-arch images and other services use multiarch images.

they are not yet implemented in the cloud builder

From balena CLI users’ point of view, I understand that balena push <myFleet> (push to the balenaCloud builder) has had support for multiarch images for a year or so. What was pending was adding similar support to balena build and balena deploy (this GitHub issue). The codebase unification between the balenaCloud builder and the balena CLI is just an implementation detail. 😃

At long last, this issue was resolved in CLI v12.49.0. 🎉

What exactly has been implemented, and how far does it go? It now has full multi arch image support so we no longer need to specify the sha? Does this apply for cloud builds or just local pushes? Does this mean there will be a gradual transition away from using the %%BALENA_ARCH%% string and we can just specify the image name?