coredns: plugin/cache: Optional Cache with Non-Blocking Adds, Admission Policy, and LFU Eviction
What would you like to be added
- Option to use a cache with the following features:
- Non-blocking adds to the cache (not making the requester of the non-cached item wait for the write lock to be acquired)
- Key collision protection (ensure that a key collision will not return an answer to the wrong question)
- Moved to: #3698
- Admission policy (ensure that single-request for an item does not push other items out of the cache)
- LFU eviction instead of random (keep frequently and recently requested items in the cache)
- TTL eviction option,
ristretto ttlEvict
, will evict expired items from the cache, when they expire (or when max serve stale is reached if enabled)
- Future
- Eventually: make the new cache the default option if it is an improvement for most
Why is this needed
- Possibly related issues
- #3388 - Restart due to OOM
- The internal cache is hashing using FNV but is not saving anything in the cache to allow detection of collision on reads
- It appears to me, that the cache can return an Answer for a non-matching Question
- The Question is not stored at all in the cache and the key is a unit64 FNV hash, so it appears to be impossible to detect that asking the cache for
microsoft.com
is returning an answer that was retrieved forsomeevilhost.com
- Returning Answers to the wrong Question a rather serious flaw to be allowed in a DNS server, but perhaps there are mitigating circumstances that I missed
- Eviction in the internal cache is random and all new items are admitted
- This is noted in the docs and is not unexpected, but it is not desired
- Eviction of heavily accessed items should be avoided (LRU should be used instead of random for eviction)
- Admission to the cache should not be automatic when the cache is full
- Only items that are requested with some frequency should be considered for admission into the cache
- Implementation of this is non-trivial
- Writing to the internal cache makes the caller wait for the acquisition of the write lock
- I may be mistaken, but it appears that neither
.set
nor.Add
are called in a new Go Routine - The caller must wait for the write lock to be obtained for these functions to return
- This effectively serializes all writes within a shard - This is primarily a problem when there are high numbers of unique records to be cached
- This may be leading to the large number of concurrent requests and OOM conditions being reported by some
- Acquiring the write lock to write a single item, then releasing it, is expensive, particularly if there are already other items that waiting to be added to the same shard
- Other cache libraries return to the caller immediately and obtain the write lock to write a batch of records at once
- I may be mistaken, but it appears that neither
Benchmarks
cd plugin/cache && go test -run='' -bench=. -benchmem ./... 2>/dev/null
Baseline - Before any changes
Using additional benchmarks added to master
: https://github.com/huntharo/coredns/tree/cache-benchmark
BenchmarkBuiltinReads-8 712930 1547 ns/op 566 B/op 14 allocs/op
BenchmarkBuiltinParallelReads-8 2479280 487 ns/op 566 B/op 14 allocs/op
BenchmarkBuiltinBalanced-8 262920 4256 ns/op 815 B/op 19 allocs/op
BenchmarkBuiltinInserts-8 323592 3568 ns/op 706 B/op 16 allocs/op
BenchmarkBuiltinParallelInserts-8 131554 7640 ns/op 2472 B/op 58 allocs/op
BenchmarkBuiltinParallelInsertsRead-8 193125 29294 ns/op 1255 B/op 30 allocs/op
Branch
- Balanced Reads/Writes
- (NumCPUs - 2) goroutines
- Demonstrates the serializing nature of the write lock on the existing cache
- With even a 2% write to read ratio the Ristretto cache outperforms the Builtin cache
- The test is configured with a 20% write to read ratio by default
- Builtin: 3,567 ns/op
- Ristretto: 2,713 ns/op
- Simple Reads
- 1 goroutine
- Demonstrates the trade off in keeping some statistics regarding item usage
- When 100% of operations are reads the builtin cache is faster
- Balanced and “Reads During Inserts” tests gives a better example of performance under mixed read/write conditions
- Baseline: 1,784 ns/op
- This is the same test with the Builtin cache on the
master
branch - Not entirely clear why the Builtin cache got faster with the changes
- This is the same test with the Builtin cache on the
- Builtin: 1,590 ns/op
- Ristretto: 1,864 ns/op
- Parallel Reads
- (NumCPUs - 2) goroutines
- Demonstrates the trade off in keeping some statistics regarding item usage
- When 100% of operations are reads the builtin cache is faster
- Balanced and “Reads During Inserts” tests gives a better example of performance under mixed read/write conditions
- Builtin: 521 ns/op
- Ristretto: 605 ns/op
- Inserts
- 1 goroutine
- Demonstrates speed improvement due to batching of writes and the caller not waiting for the write lock to be acquired during an insert operation
- Builtin: 3,715 ns/op
- Ristretto: 2,585 ns/op
- Parallel Inserts
- (NumCPUs - 2) goroutines
- Demonstrates decreased lock contention between parallel threads and requests on insert operations
- Builtin: 7,684 ns/op
- Ristretto: 3,829 ns/op
- Parallel Reads During Inserts
- (NumCPUs - 2) goroutines
- Demonstrates the advantages of the cache admission policy
- The read items take 5 seconds to repopulate if they are evicted
- The Ristretto cache keeps the expensive / frequently requested items in the cache
- The builtin cache evicts the expensive / frequently requested items
- Builtin: 26,190 ns/op
- Ristretto: 3,446 ns/op
go test -run='' -bench=. -benchmem ./... 2>/dev/null
goos: darwin
goarch: amd64
pkg: github.com/coredns/coredns/plugin/cache
BenchmarkBuiltinReads-8 647811 1590 ns/op 598 B/op 15 allocs/op
BenchmarkRistrettoReads-8 639974 1864 ns/op 616 B/op 15 allocs/op
BenchmarkBuiltinParallelReads-8 2382621 521 ns/op 598 B/op 15 allocs/op
BenchmarkRistrettoParallelReads-8 2029929 605 ns/op 611 B/op 15 allocs/op
BenchmarkBuiltinBalanced-8 285925 3567 ns/op 835 B/op 20 allocs/op
BenchmarkRistrettoBalanced-8 387178 2713 ns/op 862 B/op 20 allocs/op
BenchmarkBuiltinInserts-8 312468 3715 ns/op 743 B/op 17 allocs/op
BenchmarkRistrettoInserts-8 459302 2585 ns/op 661 B/op 15 allocs/op
BenchmarkBuiltinParallelInserts-8 130486 7684 ns/op 2543 B/op 60 allocs/op
BenchmarkRistrettoParallelInserts-8 307554 3829 ns/op 2242 B/op 53 allocs/op
BenchmarkBuiltinParallelInsertsRead-8 216180 26190 ns/op 1315 B/op 32 allocs/op
BenchmarkRistrettoParallelInsertsRead-8 336003 3446 ns/op 1284 B/op 30 allocs/op
PASS
ok github.com/coredns/coredns/plugin/cache 49.860s
Test Case - Eviction of Frequently Requested Items
- This test case shows that requesting always unique names will push frequently requested names out of the cache
- Let the slow loop run and populate the names first
- Start the fast loop and see that the names for the slow loop are evicted (slow requests start taking 1,900 ms again)
- Add parameter
ristretto
to thecache {}
block incorefile.local
- The cache will actually be able to add the slow lookups into the cache while the fast loop is running
- It may take 2-3 loops but you will see some of the slow items start responding very quickly and eventually all of them responding very quickly
coredns -conf corefile.local &
coredns -conf corefile.erratic &
./test-slow.sh
# Wait until the above slow names are populated (the loop will speed up)
# In another terminal:
./test-fast.sh
corefile.local
- Listens on 1053
- Forwards
*.slow.
requests to another instance - Replies locally to all other requests with
erratic
on 5 ms delay
.:1053 {
errors
log
# Forward *.slow requests to another instance of CoreDNS configured to respond
# slowly with NOERROR
forward slow. 127.0.0.1:1054
cache {
success 1000 10000 10000
denial 1000 10000 10000
}
# Only requests that are not-cached and not-forwarded will get here
erratic {
delay 1 5ms
}
reload
}
corefile.erratic
- Listens on 1054
- Responds in 1900 ms to any request with NOERROR and Answer
.:1054 {
errors
erratic {
delay 1 1900ms
}
reload
}
test-fast.sh
#!/usr/bin/env bash
set -euo pipefail
i=1
while true; do
# Run 8 digs in parallel then wait for them
dig $((i)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+1)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+2)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+3)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+4)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+5)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+6)).foo.com. -p 1053 @127.0.0.1 &
dig $((i+7)).foo.com. -p 1053 @127.0.0.1 &
wait
i=$((i+8))
done
test-slow.sh
#!/usr/bin/env bash
set -euo pipefail
while true; do
for i in {1..10}; do
dig $i.foo.slow. -p 1053 @127.0.0.1
done
done
References on Caching in Go
- Review of Caching in Go:
- Ristretto Cache Intro
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (7 by maintainers)
The simple solution here is to make it an external plugin, and let it get some soak time. From there we can decide whether to make it built-in as an alternative or replace the current plugin.
On Wed, Mar 4, 2020 at 3:04 AM Miek Gieben notifications@github.com wrote:
Hi @miekg -
I see other external packages that could change / go away / have bugs that are not imported into the project, such as:
My approach to de-risking the upstream dependency was to not get rid of the internal cache package so that the
ristretto
cache option can simply be no-op’d out if a problem is discovered with the cache or the project disappeared.I don’t agree with the approach of throwing the existing solution out either. I like to ask people to fix the problem that they face and then determine if they need a more dramatic solution. My other issues/PRs are fixing problems with caching in general (without replacing the storage) and making inserts when using the builtin cache faster. I’m not advocating for using only a new cache backend, I am rather advocating for making all forms of caching better even if that makes the new cache backend less of an improvement than desired.
In fact, that is why the new cache is a config option. The idea is to let people use it if their use case benefits from it while not making
coredns
dependent on the existence and correctness of theristretto
cache project.I rather like the idea of leaving the internal cache implementation in place as it provides a clean fallback option if ever needed AND it is faster at read-only scenarios. Additionally, I want the builtin cache to remain the default for the foreseeable future.
I currently see a way to split it into 3 PRs:
I’m trying to keep all of my other changes distinct from this PR so as not to complicate the review of adding this new cache storage.
It looks like there are several issues with the current cache described. I can see the motivation.
If this cache library is generally better, I would prefer to simply replace the existing cache implementation rather than add a bunch of optional plugins.
But I would like to see some better benchmarks and validation to demonstrate the improvement.