confluent-kafka-go: Build error with golang:1.20-alpine3.17 platform=linux/arm64 using confluent-kafka-go v2.1.0

Description

ARM64 build using golang:1.20-alpine3.17 fails. AMD64 using confluent-kafka-go v2.1.0 build succeeds. ARM64 and AMD64 with v2.0.2 are also successful.

go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=arm64 .
[+] Building 164.6s (11/11) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                   0.1s
 => => transferring dockerfile: 352B                                                                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/library/golang:1.20-alpine3.17                                                                                                                                                                              0.9s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                                                                                                                          0.0s
 => [1/6] FROM docker.io/library/golang:1.20-alpine3.17@sha256:08e9c086194875334d606765bd60aa064abd3c215abfbcf5737619110d48d114                                                                                                                        0.0s
 => [internal] load build context                                                                                                                                                                                                                      0.4s
 => => transferring context: 104.94MB                                                                                                                                                                                                                  0.3s
 => CACHED [2/6] RUN echo arm64                                                                                                                                                                                                                        0.0s
 => [3/6] RUN apk add alpine-sdk ca-certificates                                                                                                                                                                                                      27.5s
 => [4/6] WORKDIR /code                                                                                                                                                                                                                                0.1s
 => [5/6] ADD . /code                                                                                                                                                                                                                                  0.3s
 => ERROR [6/6] RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=arm64 go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .                                                                                                      135.7s
------
 > [6/6] RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=arm64 go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .:
#0 135.6 # main
#0 135.6 /usr/local/go/pkg/tool/linux_arm64/link: running gcc failed: exit status 1
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_close':
#0 135.6 (.text+0xb4): undefined reference to `sasl_dispose'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_recv':
#0 135.6 (.text+0x1a0): undefined reference to `sasl_client_step'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x1c8): undefined reference to `sasl_errdetail'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x35c): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x38c): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x3ac): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_client_new':
#0 135.6 (.text+0xf74): undefined reference to `sasl_client_new'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0xfd4): undefined reference to `sasl_client_start'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0xff4): undefined reference to `sasl_errdetail'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x110c): undefined reference to `sasl_listmech'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x1180): undefined reference to `sasl_errstring'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_global_init':
#0 135.6 (.text+0x16dc): undefined reference to `sasl_client_init'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x170c): undefined reference to `sasl_errstring'
#0 135.6 collect2: error: ld returned 1 exit status
#0 135.6
------
Dockerfile:12
--------------------
  10 |     ADD . "/code"
  11 |
  12 | >>> RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=$TARGETARCH go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .
  13 |
--------------------

How to reproduce

  1. Use consumer example https://github.com/confluentinc/confluent-kafka-go/tree/master/examples/consumer_example
  2. go.mod
module main

go 1.20

require github.com/confluentinc/confluent-kafka-go/v2 v2.1.0
  1. Dockerfile
FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk add alpine-sdk ca-certificates

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=$TARGETARCH go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .
  1. Failed build
go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=arm64 .
  1. Successful build
go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=amd64 .
  1. arm64 and amd64 are successful after go.mod dependency is downgraded
require github.com/confluentinc/confluent-kafka-go/v2 v2.0.2

Checklist

Please provide the following information:

  • confluent-kafka-go and librdkafka version (LibraryVersion()): confluent-kafka-go v2.1.0

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 5
  • Comments: 16 (5 by maintainers)

Commits related to this issue

Most upvoted comments

The root cause appears to be that librdkafka now requires Cyrus SASL, but the confluent-kafka-go wrappers don’t spell out a link dependency to it.

All the workarounds above seem to avoid solving this problem by instead installing a system librdkafka-dev which requires -tags dynamic per https://github.com/confluentinc/confluent-kafka-go/#librdkafka (not sure why earlier posted workaround examples work without it; we saw linker errors still).

To fix what I understand to be the root cause, we can:

  • Ensure cyrus-sasl-dev (for Alpine, see librdkafka sasl docs for other platforms) is installed in the build and run environment
  • Tell cgo to explicitly link libsasl2.so

I adapted the repro case from the original report for go1.21 + alpine3.18 with the requisite flags:

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This works on my arm64/M1 Mac for TARGETARCH of both arm64 and amd64.

Raised this PR. And confirmed that the produced binaries don’t include rdkafka_sasl_cyrus.o, except for darwin where it’s expected to have it.

Just try making the following changes


FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update && apk add bash ca-certificates git gcc g++ libc-dev librdkafka-dev pkgconf

WORKDIR "/code"
ADD . "/code"

RUN go build -tags musl -o main .

Thank you all for raising awareness on this issue.

So I think what happened is that @emasab who built librdkafka for 2.3.0 happens to have Cyrus SASL/libsasl2 installed in their environment, and thereby confluent-kafka-go got an indirect dependency on the Cyrus SASL distribution.

That didn’t happen because we configure and build these static binaries in a Semaphore pipeline, not on our laptops. Then we import those binaries locally to push them to confluent-kafka-go.

I believe the issue is here in the release pipeline:

As it should be

                        if attr in a.info and \
                           a.info[attr] == m.attributes[origattr]:

because it’s excluding the files the files that have the attribute extra=gssapi. Given it’s not excluding them, depending on the order, the version with libsasl2 or the one without it could be taken.

That explains why the issue is present in 2.1.0 and 2.3.0 but not in 2.2.0 and 2.0.2. Going to create a PR to fix it before our upcoming 2.4.0 release.

Followup: we actually ran into a problem with the proposed workaround – CGO_LDFLAGS are injected before the cgo LDFLAGS, and gcc -l switches are sensitive to order (beautifully described here: https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking).

There’s a supremely hacky way to work around this too, using a dangling -Wl,--start-group before -lsasl2;

CGO_LDFLAGS="-Wl,--start-group -lsasl2"

GCC complains with

bin/ld: missing --end-group; added as last command line option

but essentially fixes the unclosed group for you.

As far as fixing the root cause bug; I’m not sure why there’s now a hard link dependency on libsasl2.so. But I see that the Darwin cgo LDFLAGS have -lsasl2 as part of the distribution: https://github.com/confluentinc/confluent-kafka-go/blob/master/kafka/build_darwin_arm64.go#L9. There’s probably reasons why this can’t work on Linux in general, but it might be a thread to start pulling on.

this needs a bit more attention. wasted too much time on this. 🥲