guava: Long overflow in cache statistics
We use Guava caches inside a large enterprise application.
After a few days of runtime calling the stats()
method on heavily-used caches throws an IllegalArgumentException
at the equivalent line to CacheStats.java:87, presumably because the total load time has overflowed a long and looped back round to a negative value.
It looks like this is down to the load time stopwatch being started at the point that an entry is submitted for asynchronous refresh, i.e. added to the refresh queue. Our largest queues are drained in batches and end-up with usual sizes in the tens of thousands — so the nanosecond-precision load time increases very quickly under these circumstances.
I am aware that it is a design decision for the cache statistics to be monotonically increasing, so a reasonably sane solution might be to simply drop the non-negative validation as part of the CacheStats constructor, or try to somehow wrap back to zero instead of Long.MIN_VALUE
. Metrics systems using similar to Graphite’s nonNegativeDerivative
should handle that gracefully anyway.
NB: I think this issue also affects Caffeine caches in the same way.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 25 (23 by maintainers)
Commits related to this issue
- Use LongMath.saturatedAdd/Subtract in CacheStats. Fixes https://github.com/google/guava/issues/3503 RELNOTES=avoid overflows/underflows in CacheStats ------------- Created by MOE: https://github.co... — committed to google/guava by kluever 5 years ago
- Use LongMath.saturatedAdd/Subtract in CacheStats. Fixes https://github.com/google/guava/issues/3503 RELNOTES=avoid overflows/underflows in CacheStats ------------- Created by MOE: https://github.co... — committed to google/guava by kluever 5 years ago
- Use LongMath.saturatedAdd/Subtract in CacheStats. Fixes https://github.com/google/guava/issues/3503 RELNOTES=avoid overflows/underflows in CacheStats ------------- Created by MOE: https://github.co... — committed to google/guava by kluever 5 years ago
- Guava cache sync (improved overflow handling) Use saturatedToNanos() in CacheBuilder to avoid overflows. https://github.com/google/guava/commit/7d04f72bf355e9bac1312debb3af7dca28833077 Use LongMath.... — committed to ben-manes/caffeine by ben-manes 5 years ago
- Use LongMath.saturatedAdd/Subtract in CacheStats. Fixes https://github.com/google/guava/issues/3503 RELNOTES=avoid overflows/underflows in CacheStats ------------- Created by MOE: https://github.co... — committed to google/guava by kluever 5 years ago
- Add a test for LongAdder overflow behavior. See https://github.com/google/guava/issues/3503 RELNOTES=n/a ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=256193549 — committed to google/guava by kluever 5 years ago
- Saturate LongAdder.sum() results to Long.MAX_VALUE during an overflow. Fixes https://github.com/google/guava/issues/3503 RELNOTES=Fix potential overflow/IAE during cache stats calculations. -------... — committed to google/guava by kluever 5 years ago
- Add a test for LongAdder overflow behavior. See https://github.com/google/guava/issues/3503 RELNOTES=n/a ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=256193549 — committed to google/guava by kluever 5 years ago
- Saturate LongAdder.sum() results to Long.MAX_VALUE during an overflow. Fixes https://github.com/google/guava/issues/3503 RELNOTES=Fix potential overflow/IAE during cache stats calculations. -------... — committed to google/guava by kluever 5 years ago
- Add MediaType for "image/heif" and "image/jp2" RELNOTES=Add MediaType for "image/heif" and "image/jp2" ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=260960132 — committed to google/guava by deleted user 5 years ago
LongAccumulator
would work, as you can push thesaturatedAdd
as the accumulation function.I think if we consider overflow non-deterministic and only promise to avoid exceptions, then it is not our responsibility to make counts correct. That means either is okay, and better handling is the user’s responsibility by providing an alternative
StatsCounter
(note - Guava doesn’t allow that currently).We could possibly use
LongAccumulator
withLongMath.saturatedAdd
?What about using
LongMath.saturatedAdd(long, long)
inCacheStats.plus(CacheStats)
?Presumably that would pin the totals to
Long.MAX_VALUE
, which seems better than having them wrap-around to negative values and then get reset to0
?Thanks @ben-manes. Sounds good.
Just to give a more detailed example of how we’re seeing this seemingly very unlikely behaviour:
Imagine a cache with a few hundred thousand entries, with “refresh after write” of 5 minutes so that frequently-requested items are reasonably fresh.
The
reload
method on the loader adds the element to a queue which, to optimise network calls to a slow downstream service, drains slowly in large batches, i.e. with bulk calls.With an average queue size of 50k, we are effectively accruing 5.7 years (50,000 hours) of load time per hour, well over a hundred years per day. Hence seeing the overflows after only a few days of the application running in production.
/cc @ben-manes