VictoriaMetrics: error while fetching data from remote storage: snappy: decoded block is too large
Describe the bug
vmctl exits with error with message:
2023/04/24 20:46:22 remote read error: request failed for: error while fetching data from remote storage: error while sending request to http://localhost:10080/api/v1/read: Post "http://localhost:10080/api/v1/read": EOF; Data len 36(36)
when importing data from using the prometheus remotre-read to read from a Thanos Store Gateway.
This may not be a VM bug, but is a blocker to Thanos data migration into VM. Any help is welcomed.
To Reproduce
- start the
thanos-remote-readpointing to a Thanos Storage Gateway pod
./bin/thanos-remote-read -store 10.68.6.4:10901 -log.level debug
- start
vmctlto read fromremote-readand send to avmstorepod:
./vmctl-prod remote-read --remote-read-src-addr=http://localhost:10080 --remote-read-filter-time-start=2023-04-16T00:00:00Z --remote-read-step-interval=hour --vm-addr=http://10.68.5.59:8482 --vm-concurrency=2 --remote-read-filter-time-end=2023-04-16T12:00:00Z --verbose
Some start/stop dates will go through but others will stop processing.
Version
./vmctl-prod --version
vmctl version vmctl-20230407-010146-tags-v1.90.0-0-gb5d18c0d2
2023/04/24 20:57:46 Total time: 934.204_s
VmStorage is v1.90.0-cluster in Google GKE
Logs
thanos-remote-read:
./bin/thanos-remote-read -store 10.68.6.4:10901 -log.level debug
info: starting up thanos-remote-read...
ts=2023-04-24T20:44:26.867171185Z caller=main.go:278 level=info traceID=00000000000000000000000000000000 msg="thanos request" request="min_time:1681603200000 max_time:1681606799999 matchers:<type:RE name:\"__name__\" value:\".*\" > aggregates:RAW "
2023/04/24 20:46:22 http: panic serving 127.0.0.1:39718: snappy: decoded block is too large
goroutine 51 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1854 +0xbf
panic({0xaaf440, 0xc000088ed0})
/usr/local/go/src/runtime/panic.go:890 +0x263
github.com/golang/snappy.Encode({0x0?, 0xc2d9ed2240?, 0xb96dd0?}, {0xc417f00000?, 0xc0000342d0?, 0xc7a9e0?})
/go/pkg/mod/github.com/golang/snappy@v0.0.1/encode.go:20 +0x2ba
main.(*API).remoteRead(0xc00218d7b0?, {0xc81c60, 0xc000034280}, 0xc00012a500, {0xc7a9e0, 0xc0001a4180})
/go/pkg/mod/github.com/!g-!research/thanos-remote-read@v0.4.0/main.go:232 +0x626
main.setup.func2({0xc81c60?, 0xc000034280?}, 0x100?)
/go/pkg/mod/github.com/!g-!research/thanos-remote-read@v0.4.0/main.go:163 +0x30
main.errorWrap.func1({0xc81c60, 0xc000034280}, 0xc78601?)
/go/pkg/mod/github.com/!g-!research/thanos-remote-read@v0.4.0/main.go:169 +0x2b
net/http.HandlerFunc.ServeHTTP(0xc81fe0?, {0xc81c60?, 0xc000034280?}, 0xc786a8?)
/usr/local/go/src/net/http/server.go:2122 +0x2f
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP(0xc0001b8180, {0x7f18948766f8?, 0xc0000341e0}, 0xc00012a100)
/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.16.0/handler.go:179 +0x971
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0xc81300?, 0xc00013e000?}, 0xc00012a100)
/go/pkg/mod/github.com/prometheus/client_golang@v1.5.1/prometheus/promhttp/instrument_server.go:100 +0x94
net/http.HandlerFunc.ServeHTTP(0xc00013e000?, {0xc81300?, 0xc00013e000?}, 0xb94581?)
/usr/local/go/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0xc81300, 0xc00013e000}, 0xc00012a100)
/usr/local/go/src/net/http/server.go:2500 +0x149
net/http.serverHandler.ServeHTTP({0xc7e3a8?}, {0xc81300, 0xc00013e000}, 0xc00012a100)
/usr/local/go/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0002a4360, {0xc81fe0, 0xc0001931d0})
/usr/local/go/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3089 +0x5ed
vmctl:
./vmctl-prod remote-read --remote-read-src-addr=http://localhost:10080 --remote-read-filter-time-start=2023-04-16T00:00:00Z --remote-read-step-interval=hour --vm-addr=http://10.68.5.59:8482 --vm-concurrency=2 --remote-read-filter-time-end=2023-04-16T12:00:00Z --verbose
Selected time range "2023-04-16 00:00:00 +0000 UTC" - "2023-04-16 12:00:00 +0000 UTC" will be split into 12 ranges according to "hour" step. Continue? [Y/n]
VM worker 0:_ ? p/s
VM worker 1:_ ? p/s
Processing ranges: 0 / 12 [_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________] 0.00%
2023/04/24 20:46:22 Import finished!
2023/04/24 20:46:22 VictoriaMetrics importer stats:
idle duration: 0s;
time spent while importing: 1m56.725018692s;
total samples: 0;
samples/s: 0.00;
total bytes: 0 B;
bytes/s: 0 B;
import requests: 0;
import requests retries: 0;
2023/04/24 20:46:22 remote read error: request failed for: error while fetching data from remote storage: error while sending request to http://localhost:10080/api/v1/read: Post "http://localhost:10080/api/v1/read": EOF; Data len 36(36)
Screenshots
No response
Used command-line flags
No response
Additional information
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (9 by maintainers)
I tried to understand what the issue is with
snappy, but fail identifying a solution 😃 What I can tell is that the issue is 100% insidethanos-remote-readwhen encoding the file (even if the error is a decode error… which is why it’s puzzling me)for 1), i’m all-in. I looked at some code and it should’nt be that hard. I used https://github.com/sepich/thanos-kit to dump the Thanos blocks into Prometheus style metrics, and added the
External Labelsfrom themeta.jsonfile. It seems to be working this way, as long as you work on one block at a time…So far i’m stopping working on Thanos data migration as we have way too much useless data and I will look into a way to define what needs to be migrated (who needs 3 years of
upmetric ?)