opentelemetry-collector: Collector constantly breaking down
Describe the bug After a while the pod, on which the collector runs, stops with this error:
panic: runtime error: slice bounds out of range [-2:]
32
31
goroutine 137 [running]:
30
go.opentelemetry.io/collector/pdata/internal/data/protogen/common/v1.(*AnyValue).MarshalToSizedBuffer(0xc002dfd810, {0xc001e84000, 0x22, 0x30c16})
29
go.opentelemetry.io/collector/pdata@v0.62.1/internal/data/protogen/common/v1/common.pb.go:483 +0xd0
28
go.opentelemetry.io/collector/pdata/internal/data/protogen/common/v1.(*KeyValue).MarshalToSizedBuffer(0xc002dfd800, {0xc001e84000, 0x22, 0x30c16})
27
go.opentelemetry.io/collector/pdata@v0.62.1/internal/data/protogen/common/v1/common.pb.go:700 +0x3a
26
go.opentelemetry.io/collector/pdata/internal/data/protogen/resource/v1.(*Resource).MarshalToSizedBuffer(0xc000753360, {0xc001e84000, 0x30984?, 0x30c16})
25
go.opentelemetry.io/collector/pdata@v0.62.1/internal/data/protogen/resource/v1/resource.pb.go:146 +0xf0
24
go.opentelemetry.io/collector/pdata/internal/data/protogen/trace/v1.(*ResourceSpans).MarshalToSizedBuffer(0xc000753360, {0xc001e84000, 0x30984, 0x30c16})
23
go.opentelemetry.io/collector/pdata@v0.62.1/internal/data/protogen/trace/v1/trace.pb.go:890 +0x105
22
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1.(*ExportTraceServiceRequest).MarshalToSizedBuffer(0xc003408198, {0xc001e84000, 0x30c16, 0x30c16})
21
go.opentelemetry.io/collector/pdata@v0.62.1/internal/data/protogen/collector/trace/v1/trace_service.pb.go:351 +0xac
20
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1.(*ExportTraceServiceRequest).Marshal(0xc0cebc8861798197?)
19
go.opentelemetry.io/collector/pdata@v0.62.1/internal/data/protogen/collector/trace/v1/trace_service.pb.go:331 +0x56
18
go.opentelemetry.io/collector/pdata/ptrace/ptraceotlp.Request.MarshalProto(...)
17
go.opentelemetry.io/collector/pdata@v0.62.1/ptrace/ptraceotlp/traces.go:88
16
go.opentelemetry.io/collector/exporter/otlphttpexporter.(*exporter).pushTraces(0xc0001255f0, {0x7388850, 0xc003291470}, {0xc002756e80?})
15
go.opentelemetry.io/collector@v0.62.1/exporter/otlphttpexporter/otlp.go:99 +0x32
14
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesRequest).Export(0x279293e?, {0x7388850?, 0xc003291470?})
13
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/traces.go:70 +0x34
12
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc000d34750, {0x73a7158, 0xc00340ac30})
11
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/common.go:203 +0x96
10
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send(0xc000125680, {0x73a7158, 0xc00340ac30})
9
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:388 +0x58d
8
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send(0xc000de0e88, {0x73a7158, 0xc00340ac30})
7
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/traces.go:134 +0x88
6
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1({0x73a7158, 0xc00340ac30})
5
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/queued_retry.go:206 +0x39
4
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1()
3
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/internal/bounded_memory_queue.go:61 +0xb6
2
created by go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers
1
go.opentelemetry.io/collector@v0.62.1/exporter/exporterhelper/internal/bounded_memory_queue.go:56 +0x45
Steps to reproduce Using otel/opentelemetry-collector-contrib:0.62.1 docker image and config:
apiVersion: v1
kind: ConfigMap
metadata:
name: ecom-opentelemetry-collector
labels:
helm.sh/chart: opentelemetry-collector-0.30.0
app.kubernetes.io/name: ecom-opentelemetry-collector
app.kubernetes.io/instance: ecom-dev
app.kubernetes.io/version: "0.59.0"
app.kubernetes.io/managed-by: Helm
data:
relay: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
otlp/spanmetrics:
protocols:
grpc:
endpoint: localhost:12345
processors:
batch: {}
spanmetrics:
metrics_exporter: otlp/spanmetrics
dimensions_cache_size: 5000
latency_histogram_buckets:
- 10ms
- 100ms
- 1s
- 2s
- 4s
- 8s
- 16s
- 32s
aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
dimensions:
- name: http.status_code
- name : target_xpath
- name : some_more_stuff
exporters:
logging:
loglevel: debug
otlphttp:
endpoint: http://jaeger-collector.jaeger.svc:4318
tls:
insecure: true
sending_queue:
num_consumers: 4
queue_size: 100
retry_on_failure:
enabled: true
zipkin:
endpoint: http://jaeger-collector.jaeger.svc:9411/api/v2/spans
tls:
insecure: true
sending_queue:
num_consumers: 4
queue_size: 100
retry_on_failure:
enabled: true
otlp/spanmetrics:
endpoint: 127.0.0.1:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
namespace: default
service:
extensions:
- health_check
telemetry:
logs:
level: debug
metrics:
level: detailed
address: 0.0.0.0:8888
pipelines:
logs:
receivers:
- otlp
processors:
- batch
exporters:
- logging
traces:
receivers:
- otlp
processors:
- spanmetrics
- batch
exporters:
- otlphttp
- logging
metrics:
receivers:
- otlp
processors:
- batch
exporters:
- logging
- prometheus
metrics/spanmetrics:
receivers:
- otlp/spanmetrics
exporters:
- otlp/spanmetrics
extensions:
health_check: {}
Environment OS: linux docker on k8s
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 24 (11 by maintainers)
@ambition-consulting @andretong @Edition-X found the bug, will submit a fix soon. In the meantime if you want a quick fix, remove the logging exporter from the pipelines, or do not configure
loglevel: debug
.Hello everyone! I’m having the same issue with both versions, v0.63.0 and v0.62.1
As an extra, I’m using the Attributes Span Processor feature
The only thing I could notice prior to the error is that is trying to process a trace with at least 26 spans and then it crashes. Here I attach the stack trace of the error
yes with both - those are also the only versions I have tested.
Also, can you run the collector with this configuration for the pipelines, to isolate the problem: