thanos: nil pointer dereference in main-2022-10-02-d533abf8 forward

Thanos, Prometheus and Golang version used: main-2022-10-02-d533abf8

Object Storage Provider: AWS

What happened:

Query pods fail with the following logs:

level=info ts=2023-01-09T19:05:58.690951095Z caller=factory.go:43 msg="loading tracing configuration"
level=info ts=2023-01-09T19:05:58.696610373Z caller=options.go:26 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2023-01-09T19:05:58.697727856Z caller=query.go:759 msg="starting query node"
level=info ts=2023-01-09T19:05:58.697991079Z caller=intrumentation.go:75 msg="changing probe status" status=healthy
level=info ts=2023-01-09T19:05:58.69803157Z caller=http.go:73 service=http/server component=query msg="listening for requests and metrics" address=0.0.0.0:9090
level=info ts=2023-01-09T19:05:58.698204012Z caller=tls_config.go:195 service=http/server component=query msg="TLS is disabled." http2=false
level=info ts=2023-01-09T19:05:58.698275323Z caller=intrumentation.go:56 msg="changing probe status" status=ready
level=info ts=2023-01-09T19:05:58.698316933Z caller=grpc.go:131 service=gRPC/server component=query msg="listening for serving gRPC" address=0.0.0.0:10901
level=error ts=2023-01-09T19:05:58.700889734Z caller=resolver.go:99 msg="failed to lookup SRV records" host=_grpc._tcp.thanos-ruler-operated err="no such host"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x13e0ee8]

goroutine 296 [running]:
go.opentelemetry.io/otel/sdk/trace.parentBased.ShouldSample({{0x0, 0x0}, {{0x2672118, 0x39346d0}, {0x26720f0, 0x39346d0}, {0x2672118, 0x39346d0}, {0x26720f0, 0x39346d0}}}, ...)
	/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.9.0/trace/sampling.go:280 +0x1c8
github.com/thanos-io/thanos/pkg/tracing/migration.samplerWithOverride.ShouldSample(...)
	/app/pkg/tracing/migration/sampler.go:42
go.opentelemetry.io/otel/sdk/trace.(*tracer).newSpan(0xc0006e1bc0, {0x267d0d0, 0xc0007ec120}, {0xc000778630, 0x16}, 0xc0007eb470)
	/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.9.0/trace/tracer.go:90 +0x456
go.opentelemetry.io/otel/sdk/trace.(*tracer).Start(0xc0006e1bc0, {0x267d0d0, 0xc0007ec120}, {0xc000778630, 0x16}, {0xc0007bdc00?, 0x4?, 0xc0007eb5a0?})
	/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.9.0/trace/tracer.go:47 +0x10e
go.opentelemetry.io/otel/bridge/opentracing.(*WrapperTracer).Start(0xc000782228, {0x267d0d0?, 0xc0007ec120?}, {0xc000778630?, 0x1e437a0?}, {0xc0007bdc00?, 0x7f52aecb03d8?, 0xc000a209c0?})
	/go/pkg/mod/go.opentelemetry.io/otel/bridge/opentracing@v1.10.0/wrapper.go:79 +0x4b
go.opentelemetry.io/otel/bridge/opentracing.(*BridgeTracer).StartSpan(0xc000770960, {0xc000778630, 0x16}, {0xc0007ec0c0, 0x3, 0x39346d0?})
	/go/pkg/mod/go.opentelemetry.io/otel/bridge/opentracing@v1.10.0/bridge.go:430 +0x3f4
github.com/thanos-io/thanos/pkg/tracing/migration.(*bridgeTracerWrapper).StartSpan(0x1f9a9a0?, {0xc000778630?, 0x39346d0?}, {0xc0007ec0c0?, 0x38c9480?, 0xc0007eb818?})
	/app/pkg/tracing/migration/bridge.go:89 +0x26
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.newClientSpanFromContext({0x267d098, 0xc0007b5b60}, {0x2678120, 0xc000125748}, {0xc000778630, 0x16})
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:92 +0x244
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReportable).ClientReporter(0xc0007824e0, {0x267d098, 0xc0007b5b60}, {0x0?, 0x0?}, {0x2197e1a, 0x5}, {0x21bcafa, 0x10}, {0x21bcb0b, ...})
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:51 +0x127
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryClientInterceptor.func1({0x267d098, 0xc0007b5b60}, {0x21bcaf9, 0x16}, {0x209f7e0, 0x39346d0}, {0x209f920, 0xc0007b5c80}, 0xc000100960?, 0x227e1b8, ...)
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:19 +0x195
github.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryClient.func1.1.1({0x267d098?, 0xc0007b5b60?}, {0x21bcaf9?, 0x38?}, {0x209f7e0?, 0x39346d0?}, {0x209f920?, 0xc0007b5c80?}, 0x0?, {0xc000a13e80, ...})
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:74 +0x86
github.com/grpc-ecosystem/go-grpc-prometheus.(*ClientMetrics).UnaryClientInterceptor.func1({0x267d098, 0xc0007b5b60}, {0x21bcaf9, 0x16}, {0x209f7e0, 0x39346d0}, {0x209f920, 0xc0007b5c80}, 0x8?, 0xc00000d020, ...)
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:112 +0x117
github.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryClient.func1.1.1({0x267d098?, 0xc0007b5b60?}, {0x21bcaf9?, 0x203000?}, {0x209f7e0?, 0x39346d0?}, {0x209f920?, 0xc0007b5c80?}, 0x1?, {0xc000a13e80, ...})
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:74 +0x86
github.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryClient.func1({0x267d098, 0xc0007b5b60}, {0x21bcaf9, 0x16}, {0x209f7e0, 0x39346d0}, {0x209f920, 0xc0007b5c80}, 0x0?, 0x227e1b8, ...)
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:83 +0x157
google.golang.org/grpc.(*ClientConn).Invoke(0xc0005c3900?, {0x267d098?, 0xc0007b5b60?}, {0x21bcaf9?, 0x16?}, {0x209f7e0?, 0x39346d0?}, {0x209f920?, 0xc0007b5c80?}, {0xc00071b680, ...})
	/go/pkg/mod/google.golang.org/grpc@v1.45.0/call.go:35 +0x223
github.com/thanos-io/thanos/pkg/info/infopb.(*infoClient).Info(0xc000a04610, {0x267d098, 0xc0007b5b60}, 0x8?, {0xc00071b680, 0x1, 0x1})
	/app/pkg/info/infopb/rpc.pb.go:422 +0xc9
github.com/thanos-io/thanos/pkg/query.(*endpointRef).Metadata(0xc00016fa80, {0x267d098, 0xc0007b5b60}, {0x2663ca0, 0xc000a04610}, {0x267e368, 0xc000a04618})
	/app/pkg/query/endpointset.go:66 +0x11e
github.com/thanos-io/thanos/pkg/query.(*EndpointSet).updateEndpoint(0xc0000b4980, {0x267d098, 0xc0007b5b60}, 0xc00000ced0, 0xc00016fa80)
	/app/pkg/query/endpointset.go:414 +0x105
github.com/thanos-io/thanos/pkg/query.(*EndpointSet).Update.func2(0xc00000ced0)
	/app/pkg/query/endpointset.go:354 +0x2cb
created by github.com/thanos-io/thanos/pkg/query.(*EndpointSet).Update
	/app/pkg/query/endpointset.go:343 +0x60a

Ruler pods fail with the following logs:

level=info ts=2023-01-09T19:06:55.17554435Z caller=intrumentation.go:56 component=rules msg="changing probe status" status=ready
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x13e0ee8]

goroutine 1725 [running]:
go.opentelemetry.io/otel/sdk/trace.parentBased.ShouldSample({{0x0, 0x0}, {{0x2672118, 0x39346d0}, {0x26720f0, 0x39346d0}, {0x2672118, 0x39346d0}, {0x26720f0, 0x39346d0}}}, ...)
	/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.9.0/trace/sampling.go:280 +0x1c8
github.com/thanos-io/thanos/pkg/tracing/migration.samplerWithOverride.ShouldSample(...)
	/app/pkg/tracing/migration/sampler.go:42
go.opentelemetry.io/otel/sdk/trace.(*tracer).newSpan(0xc00084ebc0, {0x267d0d0, 0xc001b3e2d0}, {0x21a41c9, 0xc}, 0xc04dae9ab8)
	/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.9.0/trace/tracer.go:90 +0x456
go.opentelemetry.io/otel/sdk/trace.(*tracer).Start(0xc00084ebc0, {0x267d0d0, 0xc001b3e2d0}, {0x21a41c9, 0xc}, {0xc020e56a40?, 0x4?, 0xc04c3b4be8?})
	/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.9.0/trace/tracer.go:47 +0x10e
go.opentelemetry.io/otel/bridge/opentracing.(*WrapperTracer).Start(0xc000516870, {0x267d0d0?, 0xc001b3e2d0?}, {0x21a41c9?, 0x1e437a0?}, {0xc020e56a40?, 0x0?, 0xc045d54c00?})
	/go/pkg/mod/go.opentelemetry.io/otel/bridge/opentracing@v1.10.0/wrapper.go:79 +0x4b
go.opentelemetry.io/otel/bridge/opentracing.(*BridgeTracer).StartSpan(0xc00039ede0, {0x21a41c9, 0xc}, {0x0, 0x0, 0x33450?})
	/go/pkg/mod/go.opentelemetry.io/otel/bridge/opentracing@v1.10.0/bridge.go:430 +0x3f4
github.com/thanos-io/thanos/pkg/tracing/migration.(*bridgeTracerWrapper).StartSpan(0x267d0d0?, {0x21a41c9?, 0x10000000043b247?}, {0x0?, 0xc045c1c780?, 0xc045d54e90?})
	/app/pkg/tracing/migration/bridge.go:89 +0x26
github.com/thanos-io/thanos/pkg/tracing.StartSpan({0x267d0d0, 0xc03ace95c0}, {0x21a41c9, 0xc}, {0x0, 0x0, 0x0})
	/app/pkg/tracing/tracing.go:72 +0x18f
github.com/thanos-io/thanos/pkg/tracing.DoInSpan({0x267d0d0?, 0xc03ace95c0?}, {0x21a41c9?, 0x7a?}, 0xc04dae9f60, {0x0?, 0xc0469fa370?, 0xc0461c2840?})
	/app/pkg/tracing/tracing.go:93 +0x59
main.runRule.func7()
	/app/cmd/thanos/rule.go:532 +0x96
github.com/oklog/run.(*Group).Run.func1({0xc03ace95f0?, 0xc000293a40?})
	/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 +0x2f
created by github.com/oklog/run.(*Group).Run
	/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:37 +0x22a

What you expected to happen:

Query and Ruler pods to behave as expected

How to reproduce it (as minimally and precisely as possible):

Deploy a query pod with the following STS configuration:

containers:
      - name: thanos-query
        args:
        - query
        - --grpc-address=[REDACTED]
        - --http-address=[REDACTED]
        - --log.level=info
        - --log.format=logfmt
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --query.replica-label=thanos_ruler_replica
        - --endpoint=dnssrv+_grpc._tcp.thanos-ruler-operated
        - --endpoint=dnssrv+_grpc._tcp.thanos-sidecar
        - --endpoint=dnssrv+_grpc._tcp.thanos-store
        - --query.auto-downsampling
        - --tracing.config=$(TRACING_CONFIG)

Deploy a ruler pod using the monitoring.coreos/v1 CRD

Full logs to relevant components:

Anything else we need to know:

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 35 (34 by maintainers)

Most upvoted comments

@xBazilio just missing the changelog entry and then it’ll be perfect. Thanks for the contribution!

I created the PR https://github.com/thanos-io/thanos/pull/6066/files it’s beeing reviewed

Yeah I think that’s the same issue. We just never released v0.29.1 and never merge back those commits to main.