guava: Concurrency issue between get(K key, Callable extends V> valueLoader) and invalidate(Object key)
Hi,
I encountered a concurrency issue between the “get(K key, Callable<? extends V> valueLoader)” and “invalidate(Object key)” methods with a basic (and i suppose a common) usage of the cache which let a stale value in the cache.
Here is the use case:
- The cache is initially empty
- 1 thread is getting a value in the callable with a given key while an other thread is invalidating the same given key
Thread 1
Bean myBean = cache.get(ID, new Callable<Bean>() {
@Override
public Bean call() throws Exception {
Bean bean = loadFromDB(ID); // (1)
return bean; // (4)
}
});
Thread 2
// Update just one property of the bean in DB
updatePartialDataInDB(ID, "newValue1"); // (2)
// Then, we need to invalidate the cache
cache.invalidate(ID); // (3)
The execution sequence order is marked with // (number) After the point // 4, I have the old object in myBean variable which is fine. However, If i try to get again the bean with the identifier ID, I expect to have no value in the cache but a fresh bean reloaded from the DB. Unfortunately, that is not the case and a stale value is kept in the cache.
Is that a wanted behavior?
In LocalCache$Segment#remove(Object key, int hash) line 3081 to 3087 (guava 17.0):
RemovalCause cause;
if (entryValue != null) {
cause = RemovalCause.EXPLICIT;
} else if (valueReference.isActive()) {
cause = RemovalCause.COLLECTED;
} else {
// currently loading
return null;
}
Shouldn’t a particular process be done before return null ?
Thanks
About this issue
- Original URL
- State: open
- Created 10 years ago
- Reactions: 4
- Comments: 17 (9 by maintainers)
Commits related to this issue
- Fix compatibility between the cache compute methods and a load The asMap().compute implementation did not take into account that the present value may be loading. A load does not block other writes t... — committed to ben-manes/guava by ben-manes 4 years ago
- Fix JDBC connector metadata caching race condition Due to https://github.com/google/guava/issues/1881, invalidation does not invalidate any loading that is in progress. In order to overcome this a g... — committed to kokosing/trino by kokosing 2 years ago
This is causing some very serious issues for us. I would argue that anyone using a lazy-loading cache would expect a happens-before relationship between invalidate and a following call to get, but since it can piggy-back on an existing load operation that fetched data before invalidate was called, that relationship does not exist.
I started poking around in here to try to fix it, but as @lowasser hinted, it isn’t trivial. The main problems are
This is sufficiently painful that at this point I’m inclined to switch to another library, instead.
@ben-manes thanks for the update about Caffeine.
I am still looking for the Guava team’s perspective on the problem. In particular, would a patch be accepted? what solution categories have been discarded already? What are acceptable drawbacks, if any, for a potential fix?
@lowasser 's comment from 2014 (https://github.com/google/guava/issues/1881#issuecomment-67080198) suggests a patch would be welcome, but there was also a concern about value-in-flight invalidation (https://github.com/google/guava/issues/1881#issuecomment-67076595). Of course, both are couple years old, so they not necessarily describe current state.
Edit: a Guava Cache wrapper that doesn’t suffer from invalidation race was extracted from Trino project as a standalone library: https://github.com/findepi/evictable-cache
@cpovirk sorry for refreshing an old issue, it seems still relevant. Did you come to agreement on what’s the expected behavior?
For me, “
get()
afterinvalidate()
should return fresh state” is the expected behavior.Caffeine resolves this for explicit keyed operations, but not for clearing out the whole cache. That is because it decorates, rather than forks, ConcurrentHashMap which suppresses in-flight loads from appearing in its iterators. A trivial workaround to that is to append a generational id to the key so that existing entries cannot be retrieved and a stale in-flight load detected by the reader to possibly retry.
I second @sereda expectation of causality, in that a loaded value V(K) must be updated when the key is invalidated. Has there been a consensus how to work around this particular problem? If so, it should be included in https://github.com/google/guava/wiki/CachesExplained
Hi,
Speaking of semantics, it’s not value that is being invalidated, it’s the key. The key is certainly there - and the value is being calculated.
Given that value V(K) is calculated from some other value X(K) (a row in DB, in this sample case), invalidate(K) should mean that any values based on values of X(K) retrieved up to the moment of invalidation are subject for recalculation.
Otherwise, the cache is pretty much useless in a simple case of caching a database row.