thanos: Retry on network failures (e.g uploads)
Not critical since compactor just restarted and continued just fine, but can be annoying.
level=error name=thanos-compactor ts=2018-04-28T10:32:12.73383864Z caller=main.go:147 msg="running command failed" err="first pass of downsampling failed: retrieve bucket block metas: get meta for block 01C6XZ1256S7VFNQP9D36XJ4F4: Get https://storage.googleapis.com/thanos-alpha/01C6XZ1256S7VFNQP9D36XJ4F4/meta.json: dial tcp [xxx]:443: connect: network is unreachable"
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 5
- Comments: 31 (24 by maintainers)
Commits related to this issue
- compact: add backoff to the retry to upload/download buckets Add backoff reply for a single object storage query request, except Range and Iter methods. Error handler splits errors on net/http and ot... — committed to xjewer/thanos by xjewer 6 years ago
- add backoff to the retry to upload/download buckets Add backoff reply for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the r... — committed to xjewer/thanos by xjewer 6 years ago
- add the retry to upload/download bucket operations Add backoff retry for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the re... — committed to xjewer/thanos by xjewer 6 years ago
- add the retry to upload/download bucket operations Add backoff retry for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the re... — committed to xjewer/thanos by xjewer 6 years ago
- add the retry to upload/download bucket operations Add backoff retry for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the re... — committed to xjewer/thanos by xjewer 6 years ago
- add the retry to upload/download bucket operations Add backoff retry for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the re... — committed to xjewer/thanos by xjewer 6 years ago
- add the retry to upload/download bucket operations Add backoff retry for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the re... — committed to xjewer/thanos by xjewer 6 years ago
@bwplotka this still seems to happen in v0.3.1. The behavior I see is that the timeout occurs, not exactly sure whether the retry is triggered within minio or not, but the compactor exits and restarts. I’d assume that on restart it’s cleaning the compaction directory and effectively starting from 0 again
I haven’t experienced a single error in the last 4 days. @bwplotka I’d be happy to contribute a patch for the
timeout awaiting response headers
issue, but I’d like to ask what your preferred option would be: to simply increase it to another arbitrary value (e.g. 2 minutes) or to add a configuration flag. The latter is more flexible, but at the same time adds complexity without (imho) adding much value. Pinging @alvaroaleman too as the creator of #323Oh, sorry, haven’t seen this since it was closed. Could we rename the title to be a bit more generic because this affects not only compactor but sidecar as well? 😛 Yes, I agree that this should be delegated to the underlying libraries that we use but perhaps we could think of some kind even smarter solution like double checking what (if any) files were uploaded to remote storage, and to retry uploading only those files if they are still present on the disk.
@GiedriusS re: https://github.com/improbable-eng/thanos/issues/923#issue-420975085
See this issue here, but what’s the point of retrying if the underlying client provider lib retries for us? Essentialy:
The only problem is when the library we use has this logic broken, I think we should propagate this issue to them. Double retrying is not a solution.