thanos: store: i/o timeouts with new memcached config
š Hello again and thanks for the quick response in https://github.com/thanos-io/thanos/issues/1974
Thanos version used: latest master with https://github.com/thanos-io/thanos/pull/1975
Object Storage Provider: AWS S3
What happened: I am testing the new memcached config for Thanos Store and I am seeing a lot of timeouts. The timeouts are gone when setting the limit to 10s š¤
Memcached servers look OK and number of connections doesnāt exceed max_idle_connections
Thanos Store process and Memcached servers are in the same AZ.
I tried the default config and also tweaked most of the parameters but to no avail.
Any recommendations ?
Arguments:
thanos store --index-cache.config-file=/app/cache-config.yaml --chunk-pool-size=6GB --objstore.config-file=/app/objstore-config.yaml
Config:
type: MEMCACHED
config:
addresses:
- "****.0001.euw1.cache.amazonaws.com:11211"
- "****.0002.euw1.cache.amazonaws.com:11211"
timeout: 1s
max_idle_connections: 500
max_async_concurrency: 200
max_async_buffer_size: 10000
max_get_multi_concurrency: 200
max_get_multi_batch_size: 0
dns_provider_update_interval: 10s
Logs:
level=warn ts=2020-01-10T11:57:31.215656814Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DY46YH1MSD2N9PZ7M839E0SS:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:35492->10.42.8.140:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:31.215238251Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DY46YH1MSD2N9PZ7M839E0SS:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:35490->10.42.8.140:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:31.076957889Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DY46FXKKFSNF3D9TVXB13G1H:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:35496->10.42.8.140:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:30.831121025Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DXJS43Y6CEXAMVT0HZVC480E:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:34270->10.42.8.43:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:30.830995908Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DXJS43Y6CEXAMVT0HZVC480E:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:50162->10.42.8.43:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:30.754847072Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DXJSHHT4N25BRSMM27M3QH8K:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:50160->10.42.8.43:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:30.754715844Z caller=memcached_client.go:266 msg="failed to store item to memcached" key=P:01DXJSHHT4N25BRSMM27M3QH8K:DcQKMLmxz9NRLFUvIYH7xHEXCRQCie5wnCttjitdnqw err="write tcp 172.17.0.3:46124->10.42.8.43:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:29.148112488Z caller=memcached_client.go:277 msg="failed to fetch items from memcached" err="read tcp 172.17.0.3:34404->10.42.8.43:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:29.148818833Z caller=memcached_client.go:277 msg="failed to fetch items from memcached" err="read tcp 172.17.0.3:47196->10.42.8.140:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:29.108056739Z caller=memcached_client.go:277 msg="failed to fetch items from memcached" err="read tcp 172.17.0.3:48012->10.42.8.140:11211: i/o timeout"
level=warn ts=2020-01-10T11:57:29.105953129Z caller=memcached_client.go:277 msg="failed to fetch items from memcached" err="read tcp 172.17.0.3:34276->10.42.8.43:11211: i/o timeout"
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 48 (26 by maintainers)
So, I have fixed this issue for myself. As it was an implementation error xD
After investigation, because the config (per above) does not seem to make any difference, I noticed that it was NOT about the memcached index cache, but the bucket cache.
When I investigated this further, I noticed that I made a mistake on itās config. This was my config:
So after fixing my config file for the bucket cache to this, it was fixed:
To be honest, a few improvements could be made IMO:
https://thanos.io/tip/components/store.md/#caching-bucket
with an example of actually implementing the config for memcachedIām a bit tired, but will make issues/PRās tomorrow or something š
Hi, I am getting the same error.
error log
thanos and system version
systemd
index-cache-config
store-cache-bucket
Thanks!
I talked offline to @kuberkaul and shared some tips. Parallelly, Iām investigating a potential issue causing unfair prioritisation within the memcached client due to the usage of the gate. I will open a PR to fix if confirmed (will work on it shortly).
Is there any update on this?
I am also getting the below error while trying to get the historical data.
I have two queries,
My thanos store memcached configurations are give below. index caching config - memcached
bucket caching config - memcached
I had se value 0 for
max_get_multi_batch_size
,max_async_buffer_size
,max_item_size
,max_get_multi_concurrency
to set unlimited capacity so that it should cause any issue because of the limits.Thanks in advance
Iāve run some benchmarks in our cluster, using two tools:
memtier_benchmark
by RedisLabsbench
: a quick and dirty benchmark tool based on the ThanosMemcachedClient
(code here)Both benchmarks run against the same memcached server (1 instance), running in the same Kubernetes cluster where the benchmarking tool is running.
The results
Comparison on
GetMulti()
77044
ops/s,5.2ms
avg latency,56ms
max latency36058
ops/s,5ms
avg latency,76ms
max latencyThe performances of the Thanos client donāt look bad to me. Itās about 50% compared to the raw performances, which looks āreasonableā to me (it shows thereās room for improvement, but not bad compared to a c++ tool designed to squeeze out the performances with no business logic around it).
Whatās next
Other than the latest comments above: