thanos: Compactor: Does not exit on error
Thanos, Prometheus and Golang version used: 17.2
Object Storage Provider: s3
What happened: Compactor got an error but did not get killed and does not continue
What you expected to happen: Compactor exits so it can be restarted or continues regardless
How to reproduce it (as minimally and precisely as possible): n/a
Full logs to relevant components:
Logs
level=info ts=2021-03-18T13:35:24.13386452Z caller=clean.go:33 msg="started cleaning of aborted partial uploads"
level=info ts=2021-03-18T13:35:24.133906785Z caller=clean.go:60 msg="cleaning of aborted partial uploads done"
level=info ts=2021-03-18T13:35:24.13391986Z caller=blocks_cleaner.go:43 msg="started cleaning of blocks marked for deletion"
level=info ts=2021-03-18T13:35:24.133930049Z caller=blocks_cleaner.go:57 msg="cleaning of blocks marked for deletion done"
level=info ts=2021-03-18T13:35:29.026173574Z caller=fetcher.go:458 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=3.499913081s cached=5974 returned=5974 partial=0
level=error ts=2021-03-18T13:36:28.794804266Z caller=runutil.go:99 msg="function failed. Retrying in next tick" err="BaseFetcher: iter bucket: Access Denied"
level=error ts=2021-03-18T13:37:28.70842268Z caller=runutil.go:99 msg="function failed. Retrying in next tick" err="BaseFetcher: iter bucket: Access Denied"
level=warn ts=2021-03-18T13:38:24.507549166Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason="syncing metas: BaseFetcher: iter bucket: Access Denied"
level=info ts=2021-03-18T13:38:24.507583914Z caller=http.go:65 service=http/server component=compact msg="internal server is shutting down" err="syncing metas: BaseFetcher: iter bucket: Access Denied"
level=info ts=2021-03-18T13:38:25.007714323Z caller=http.go:84 service=http/server component=compact msg="internal server is shutdown gracefully" err="syncing metas: BaseFetcher: iter bucket: Access Denied"
level=info ts=2021-03-18T13:38:25.007758137Z caller=intrumentation.go:66 msg="changing probe status" status=not-healthy reason="syncing metas: BaseFetcher: iter bucket: Access Denied"
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (8 by maintainers)
By default, compactor does not crash on halt errors. there is a hidden flag you can change it.
https://thanos.io/tip/components/compact.md/#halting
i also met same issue and i am still investigating.
We’ve hit this with
v0.21.1
running on Kubernetes against a locally-hosted S3 (Ceph with radosgw):After this the process was not doing anything anymore (as described above) and was not respond to SIGTERM either.
Another thing is that we should only turn off the HTTP server at the end of everything to permit debugging via pprof in cases such as this.