postgres-operator: Postgres Operators fails to start on Minikube 1.26.0 with qemu2 driver on ARM64

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes / Minikube 1.26.0 with qemu2 driver / Apple M1 Ultra
  • Are you running Postgres Operator in production? Yes
  • Type of issue? Bug report

Some general remarks when posting a bug report:

I’m using Minikube 1.26.0 with the qemu2 driver on Apple M1 silicon and the operator fails with the following error:

postgres-operator exec /postgres-operator: exec format error

Using the latest PostgreSQL operator (1.8.2) works as expected on this same version of Minikube on Apple M1 silicon using the docker driver.

Unfortunately, the pod immediately terminates so I’ve been unable to gather any log files. Does the postgres-operator support ARM64?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 23 (1 by maintainers)

Most upvoted comments

Quoted from release page:

We are excited to announce a new release of the Postgres Operator. A rather small one but bringing you ARM support for the operator (pooler, ui and logical backup will follow). Thanks to everyone who contributed with PRs, feedback, raising issues or providing ideas.

New features

Provide Postgres-Operator as multi-arch image that can run on arm (#2268, #2127)

@abangser glad you found the solution yourself. As you mentioned, you’ve used the wrong docker-context to build the image.

The script I’m using to build a multiarch images is the following (and is located in another directory):

cd "/tmp"
echo "[INFO] Building postgresql operator ..."
git clone git@github.com:mmoscher/postgres-operator.git && pushd "postgres-operator"
git checkout arm64

docker buildx build \
		--push \
		--platform=linux/amd64,linux/arm64 \
		-t <private-repo-and-image-tag> \
		-f docker/Dockerfile \
		.
popd
rm -rf postgres-operator

However, I’m not yet using the makefile. @abangser would be awesome if you could file a PR with your change to my fork (https://github.com/mmoscher/postgres-operator/tree/arm64). Then we can work on from there and file a PR to this repo soon.

FYI: running this script on my Mac M1, using colima as docker backend, takes roughly 5m for the multiarch images to build (base images cached). However, 20m could be fine to (based on your hardware).

Thanks for the update @jonizen! 🙇

That is interesting that it did work for you. I ended up tracking down an issue where the Dockerfile has a COPY . . and that is paired with a line in the makefile that runs the docker command from the DOCKERDIR. This means that the only files available in the image are the files in the DOCKERDIR which is obviously not enough and doesn’t include the go.mod.

I fixed this by using the following docker make target:

docker: ${DOCKERDIR}/${DOCKERFILE} docker-context
	echo `(env)`
	echo "Tag ${TAG}"
	echo "Version ${VERSION}"
	echo "CDP tag ${CDP_TAG}"
	echo "git describe $(shell git describe --tags --always --dirty)"
	if ! docker buildx ls | grep -q "zalando-builder"; then \
		docker buildx create --name zalando-builder; \
	fi;
	docker buildx build \
		--rm \
		--builder zalando-builder \
		--platform linux/arm64,linux/amd64 \
		--tag $(IMAGE):$(TAG)$(CDP_TAG)$(DEBUG_FRESH)$(DEBUG_POSTFIX) \
		--push \
		--file "${DOCKERDIR}/${DOCKERFILE}" \
		.

Which has resulted in this image (no guarantee of longevity of, or updates to, this image as we are currently only using it for a demo!).

While this worked for me, I have to say I am intrigued how you ended up getting yours building as I might be doing something too heavy handed. The image did take something like 20 minutes to build!

I tried it out and i could build it and push the image without any problems. 😃 link to image

@joepa37 see if that works 😃

@mprimeaux spilo linux/arm64 support has been merged yesterday https://github.com/zalando/spilo/pull/790 and will be available with the next spilo tag (postgresql version >= 14 support only).

Now we can continue with the operator itself to get it linux/arm64 compatible. However, its baseimage registry.opensource.zalan.do/library/alpine-3.xx, is not yet available with linuxarm64 architecture in the zalando registry.

As mentioned in #2084 two options feasible. The second option, eg. hosting on ghcr.io, would be my favorite one to go with. Nevertheless, I’d no time yet to implement it. Maybe I’ve some free time at the end of the week/weekend.

For now, you can build it your self with some small changes: https://github.com/mmoscher/postgres-operator/pull/1/files

TL;DR: I’m still on it 😉

@mprimeaux building the operator on an aarch64 (linux/arm64) machine (Google Cloud Tau T2A GCE Instance) worked out for me, i.e. customizing the Makefile+Dockerfile and overriding the operator’s default image (helm chart values). Additionally, one has to use the custom, arm64 compatible, spilo image which is already available in the Zalando registry. Will test this tomorrow/next week on an Apple Silicon M1 processor. If the PoC works well-enough, I’ll file a pull request.

I think the latest release already included this changes, but you have to specify the correct image.

Look at the latest release on the release page. I also think the other parts for pooling and backup is planed 😊

Thanks! I will test this out today and reply on the PR and here.

Absolutely appreciate it would be helpful. As I mentioned in this PR, it seems to be working for me and is as far as I can/will take a commit at this time as I am not aware of where else to go. Please feel free to merge or of course rewrite if it isn’t quite right.

https://github.com/mmoscher/postgres-operator/pull/2#issuecomment-1544079462

Thanks

Yeah, I pushed the wrong image, but I got the arm64 one. So it works I believe 😊

Be aware that your output says “cached” if you cache a step that fails, you can have it all correct. Since this image builds fast, run --no-cache to rule that out.

I will have a look at this probably today after work 😊

@joepa37 I had a similar problem building on my M1 and similarly ran into the situation where @jonizen’s image was still the AMD arch.

My fix was to make 2 small changes to the code from @mmoscher (thank you!) to do a docker buildx command and build both arches.

  1. My new make target code was:

    docker: ${DOCKERDIR}/${DOCKERFILE} docker-context
        echo `(env)`
        echo "Tag ${TAG}"
        echo "Version ${VERSION}"
        echo "CDP tag ${CDP_TAG}"
        echo "git describe $(shell git describe --tags --always --dirty)"
        if ! docker buildx ls | grep -q "zalando-builder"; then \
            docker buildx create --name zalando-builder; \
        fi;
        cd "${DOCKERDIR}" && docker buildx build \
            --rm \
            --builder zalando-builder \
            --platform linux/arm64,linux/amd64 \
            --tag $(IMAGE):$(TAG)$(CDP_TAG)$(DEBUG_FRESH)$(DEBUG_POSTFIX) \
            --push \
            --file ${DOCKERFILE} \
            .
    
  2. I removed the hardcoding of the two ARGs in the dockerfile on lines 5 and 6 to be passed in.

These changes allowed me to run the following command:

IMAGE=my-repo/zalan-do-acid-postgres-operator make docker

but still got an error 😢

cd "docker" && docker buildx build \
	--rm \
	--builder zalando-builder \
	--platform linux/arm64,linux/amd64 \
	--tag syntasso/zalan-do-acid-postgres-operator:2880a58-dirty \
	--push \
	--file Dockerfile \
	.
[+] Building 16.3s (23/33)                                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                                     0.1s
 => => transferring dockerfile: 993B                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                        0.1s
 => => transferring context: 2B                                                                                                                          0.0s
 => [linux/arm64 internal] load metadata for docker.io/library/alpine:3.15                                                                               4.3s
 => [linux/amd64 internal] load metadata for docker.io/library/golang:1.17-alpine3.15                                                                    4.3s
 => [linux/amd64 internal] load metadata for docker.io/library/alpine:3.15                                                                               4.3s
 => [auth] library/alpine:pull token for registry-1.docker.io                                                                                            0.0s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                            0.0s
 => [linux/amd64 go-builder 1/8] FROM docker.io/library/golang:1.17-alpine3.15@sha256:543b0922baa147b87a568968462a9586e94b588426f51396a2666590cfba327a   0.2s
 => => resolve docker.io/library/golang:1.17-alpine3.15@sha256:543b0922baa147b87a568968462a9586e94b588426f51396a2666590cfba327a                          0.1s
 => [linux/amd64 postgres-operator 1/6] FROM docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479       0.2s
 => => resolve docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479                                     0.2s
 => [internal] load build context                                                                                                                       10.6s
 => => transferring context: 61.29MB                                                                                                                    10.4s
 => [linux/arm64 postgres-operator 1/6] FROM docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479       0.2s
 => => resolve docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479                                     0.2s
 => CACHED [linux/arm64 postgres-operator 2/6] RUN apk --no-cache add curl                                                                               0.0s
 => CACHED [linux/arm64 postgres-operator 3/6] RUN apk --no-cache add ca-certificates                                                                    0.0s
 => CACHED [linux/amd64 postgres-operator 2/6] RUN apk --no-cache add curl                                                                               0.0s
 => CACHED [linux/amd64 postgres-operator 3/6] RUN apk --no-cache add ca-certificates                                                                    0.0s
 => CACHED [linux/amd64 go-builder 2/8] WORKDIR /src                                                                                                     0.0s
 => CACHED [linux/amd64 go-builder 3/8] COPY . .                                                                                                         0.0s
 => CACHED [linux/amd64->arm64 go-builder 4/8] RUN go get -d k8s.io/client-go@kubernetes-1.22.4                                                          0.0s
 => CACHED [linux/amd64->arm64 go-builder 5/8] RUN go install github.com/golang/mock/mockgen@v1.6.0                                                      0.0s
 => ERROR [linux/amd64->arm64 go-builder 6/8] RUN go mod tidy                                                                                            1.1s
 => CACHED [linux/amd64 go-builder 4/8] RUN go get -d k8s.io/client-go@kubernetes-1.22.4                                                                 0.0s
 => CACHED [linux/amd64 go-builder 5/8] RUN go install github.com/golang/mock/mockgen@v1.6.0                                                             0.0s
 => ERROR [linux/amd64 go-builder 6/8] RUN go mod tidy                                                                                                   1.1s
------                                                                                                                                                        
 > [linux/amd64->arm64 go-builder 6/8] RUN go mod tidy:                                                                                                       
#0 0.850 go: go.mod file not found in current directory or any parent directory; see 'go help modules'
------
------
 > [linux/amd64 go-builder 6/8] RUN go mod tidy:
#0 0.897 go: go.mod file not found in current directory or any parent directory; see 'go help modules'
------
Dockerfile:15
--------------------
  13 |     RUN go get -d k8s.io/client-go@kubernetes-1.22.4
  14 |     RUN go install github.com/golang/mock/mockgen@v1.6.0
  15 | >>> RUN go mod tidy
  16 |     RUN go mod vendor
  17 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c go mod tidy" did not complete successfully: exit code: 1
make: *** [Makefile:74: docker] Error 1

This one feels more like something other than ARCH (edit: this is failing the same way on the branch with the docker changes but not on master branch when building on a M1 mac), but I may be missing how my changes impacted it. I will keep poking, but if anyone has an idea please let me know! Thanks 🙇

@mmoscher Any updates on the arm64 support? Please let me know how (or if) I can help. I’ll make time.