caffeine: Very high false positive rate observed for BloomFilter implementation.

I observed a very high false positive rate with the current implementation of BloomFilter, at times as high as 100%. My test code and results are given below. I also found out what can be changed to fix it, although not completely sure why my fix worked. I thought maybe bitmask method’s output has some bias, but upon further inspection it seems alright. I hope I am using the APIs correctly and don’t have some other bug in my test code. Would be great if someone can review and let me know. tks! P.S.: I understand that BloomFilter code might be internal to caffeine, but just want to highlight my observation.

Line: https://github.com/ben-manes/caffeine/blob/master/simulator/src/main/java/com/github/benmanes/caffeine/cache/simulator/admission/bloom/BloomFilter.java#L166

Current code:

static long bitmask(int hash) {
    return 1L << ((hash >>> 8) & INDEX_MASK);
  }
Number of Insertions False positives(% ) True positives
1024 27 (2.636719%) 1024
4096 640 (15.625000%) 4096
16384 15213 (92.852783%) 16384
65536 65536 (100.000000%) 65536
262144 262144 (100.000000%) 262144
1048576 1048576 (100.000000%) 1048576

New implementation:

static long bitmask(int hash) {
    return 1L << ((hash >>> 24) & INDEX_MASK);
  }
Number of Insertions False positives(%) True positives
1024 15 (1.464844%) 1024
4096 96 (2.343750%) 4096
16384 391 (2.386475%) 16384
65536 1598 (2.438354%) 65536
262144 6326 (2.413177%) 262144
1048576 25600 (2.441406%) 1048576

Test method:

    public void bloomFilterTest() {
        System.out.println("Number of Insertions\tFalse positives(%)\tTrue positives");
        for (int capacity = 2 << 10; capacity < 2 << 22; capacity = capacity << 2) {
            long[] input = new Random().longs(capacity).distinct().toArray();
            BloomFilter bf = new BloomFilter(input.length / 2, new Random().nextInt());
            int truePositives = 0;
            int falsePositives = 0;
            int i = 0;
            // Add only first half of input array to bloom filter
            for (; i < (input.length / 2); i++) {
                bf.put(input[i]);
            }
            // First half should be part of the bloom filter
            for (int k = 0; k < i; k++) {
                truePositives += bf.mightContain(input[k]) ? 1 : 0;
            }
            // Second half shouldn't be part of the bloom filter
            for (; i < input.length; i++) {
                falsePositives += bf.mightContain(input[i]) ? 1 : 0;
            }
            System.out.format("%d\t\t%d(%f%%)\t\t%d\n",
                input.length / 2, falsePositives,
                ((float) falsePositives / (input.length / 2)) * 100, truePositives);
        }
    }

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 57 (33 by maintainers)

Commits related to this issue

Most upvoted comments

All questions are wonderful. I’m going to edit your message to number the sub-items.