datasketches-cpp: Buffer overflow in ThetaSketch intersection update
Hi,
I am investigating a crash we saw in tests using ThetaSketch 1.0.0 which I have narrowed down to the following cause:
const uint32_t max_matches = std::min(num_keys_, sketch.get_num_retained());
uint64_t* matched_keys = AllocU64().allocate(max_matches);
uint32_t match_count = 0;
for (auto key: sketch) {
if (key < theta_) {
if (update_theta_sketch_alloc<A>::hash_search(key, keys_, lg_size_)) matched_keys[match_count++] = key;
} else if (sketch.is_ordered()) {
break; // early stop
}
}
As you can see in the debug below, match_count is higher than matched_keys, meaning we have written beyond the allocated buffer (this result in a crash eventually due to memory corruption)
Not entirely sure how hash_search could have found more match than there is elements (hash collisions?).
But in any case shouldn’t the for loop stop once it has found match_count == matched_keys
?

I can provide the sketch if that can help.
Thanks!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (8 by maintainers)
Just to clarify: this was not a case of a corrupted input sketch. The internal state of the intersection was corrupted due to missing cleanup if resizing was not necessary. However, I think that the defensive checks I added should stay.