thanos: [Thanos-compact]Some data is lost because data is compressed in a fixed period.
Thanos version used: v0.28.0
What happened: We set that the downsampling data of 5 minutes is stored for 15 days and the downsampling data of 1 hour is stored for 30 days. However, the data of 3 days after the environment is installed is lost 15 days later, and the downsampling of 1 hour is not performed. After code analysis, it is found that the thanos-compact compression period is a fixed period. The data of the first three days and the previous 11 days are a fixed period. However, there is no data in the first 11 days, which does not meet the 1-hour downsampling requirement. Therefore, the data is deleted 15 days later.
What you expected to happen:The thanos-compact compression period is determined after the environment is installed. Do not use the same period for compression.
Full logs to relevant components: This is the data information for the first set of environments. ./01H2HNTNGGMWMMA8K5688WEKXJ "resolution”:0 "level”: 3, 2023-06-08 08:00:00 2023-06-10 08:00:00 ./01H2HNVAXP5AEY43V9Z8V7ZBFX "resolution”: 300000 “level”:3, 2023-06-08 08:00:00 2023-06-10 08:00:00 ./01H2PTM63CA2ZDZD253QDQFCG2 "resolution”:0 “level”: 3, 2023-06-10 08:00:00 2023-06-12 08:00:00 ./01H2PTN1YBN5A7ADWBXBH28CWS "resolution”: 300000 “level”: 3, 2023-06-10 08:00:00 2023-06-12 08:00:00 ./01H2QP2A692YXAT5VBG5K5CHFY "resolution”: 0 “level”:2, 2023-06-12 08:00:00 2023-06-12 16:00:00 ./01H2RHHA6HVE6MWQH32R5F5NER "resolution”:0 "level”:2, 2023-06-12 16:00:00 2023-06-13 00:00:00 ./01H2SB5761802RK3TZDSW14WCS "resolution”: 0 “level”: 1, 2023-06-13 08:00:00 2023-06-13 10:00:00 ./01H2SD05HFB9QE5AZOK64A8XW1 "resolution”:0 “level”: 2, 2023-06-13 00:00:00 2023-06-13 08:00:00 ./01H2SJ0YDPF8XZ6CS63CB29XGS "resolution”: 0 “level”: 1, 2023-06-13 10:00:00 2023-06-13 12:00:00 ./01H2SRWNNS3EQV2EQG5E99A88R "resolution”: 0 “level”: 1, 2023-06-13 12:00:00 2023-06-13 14:00:00 ./01H2SZRCYKA2BKMAWWA6GJ3056 "resolution”:0 "level”: 1, 2023-06-13 14:00:00 2023-06-13 16:00:00 /01H2T6M45W3GNT95GBJSHRZ3KF "resolution”:0 “level”:2, 2023-06-13 16:00:00 2023-06-13 18:00:00 This is the data information for another set of environments. ./01H2PTKN74GDGMJGA5WZG4W6WF "resolution”: 300000 "level”: : 3, 2023-06-10 08:00:00 2023-06-12 08:00:00 ./01H2W66VV78A3ZYYHAJ8DTM8ZT "resolution”: :300000 “Level”: 3, 2023-06-12 14:00:00 2023-06-14 08:00:00 ./01H31448841863EDZRFDXM86DP "resolution”: 0 “level”: 3, 2023-06-14 08:00:00 2023-06-16 08:00:00 ./01H3144SDHJSKJ9ACCDXSHM42Q "resolution”: 300000 “level”: 3, 2023-06-14 08:00:00 2023-06-16 08:00:00 ./01H368XM3QW3F672GYAK6ROYHY "resolution”:0 "level”: 3, 2023-06-16 08:00:00 2023-06-18 08:00:00 ./01H368Y47P3SJWJR4DV3VMY8CJ "resolution”: 300000 "level”: 2023-06-16 08:00:00 2023-06-18 08:00:00 ./01H374C5NYQJ2VR7HDDPVT4YHN "resolution”:0 “level”: 2, 2023-06-18 08:00:00 2023-06-18 16:00:00 ./01H37ZV2R4P4VA4HE7RGHMV378 "resolution”:0 “level”: 2, 2023-06-18 16:00:00 2023-06-19 00:00:00 ./01H38SHH694DSMGDC94Y200403 "resolution”:0 “level”: 1, 2023-06-19 08:00:00 2 2023-06-19 10:00:00 ./01H38VA01RCBSV120XR6WGMBC3 "resolution”:0 “level”: 2, 2023-06-19 00:00:00 2023-06-19 08:00:00 ./01H390D8EBEEAONNHC95ZCVSWA "resolution”:0 “level”: 1 , 2023-06-19 10:00:00 2023-06-19 12:00:00 ./01H3978ZP6VE7RBQPQE7D8WV4 "resolution”: 0 “level”: 1, 2023-06-19 12:00:00 2023-06-19 14:00:00 ./01H39E4PY8WYZ03QXCRGN5EASX "resolution”:0 “level”: 1, 2023-06-19 14:00:00 2023-06-19 16:00:00 ./01H39N0E68BJEFWSXGG5EYG590 "resolution”:0 "level”: 1, 2023-06-19 16:00:00 2023-06-19 18:00:00 It can be seen that the compression of 5 minutes is in a fixed time period.
Anything else we need to know: Code for compressing data in a fixed time segment: splitByRange function in planner.go
if m.MinTime >= 0 {
t0 = tr * (m.MinTime / tr)
} else {
t0 = tr * ((m.MinTime - tr + 1) / tr)
}
Different m.MinTimes are calculated using the t0 = tr * (m.MinTime / tr) formula to obtain a fixed t0.
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 22 (15 by maintainers)
I think this is a recurring issue for our users. Perhaps worth erroring out if retention is set to a small period with downsampling enabled? Because in such cases downsampling will never happen.