druid: HLLSketchMerge aggregator failing for some metrics after upgrade to v0.18
Please provide a detailed title (e.g. “Broker crashes when using TopN query with Bound filter” instead of just “Broker crashes”).
Affected Version
v0.18 (upgraded from 0.16.0)
Description
The HLLSketchMerge aggregator is failing for some of our metrics after upgrading to druid 0.18.0. Reverting back to 0.16.0 fixes the issue. I have isolated specific segments where the issue occurs, moved those segments back to our 0.16 historical and have been successfully able to query the same metric.
Re-indexing data does not seem to fix the issue.
Error Message.
{
"error": "Unknown exception",
"errorMessage": "java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.datasketches.SketchesArgumentException: Incomming sketch is corrupted, Rebuild_CurMin_Num_KxQ flag is set.",
"errorClass": "java.lang.RuntimeException",
"host": null
}
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 33 (33 by maintainers)
We have a fix!
@clintropolis @scrawfor @gianm @AlexanderSaydakov
I want to thank all of you for your help! This was truly a team effort!
With clues from @clintropolis and @scrawfor @AlexanderSaydakov was able to reproduce the bug with his knowledge of how the aggregator works. And from that I was able to locate the bug, which was my fault. I put in a check for a flag where there did not need to be one. So it was actually throwing an unnecessary exception.
We will be going over this part of the code carefully, adding unit tests and preparing for a new release. Due to the dual 72 hour release cycles this will take a week or so.
Thank you for your patience!
Lee.
Folks, One of the learnings from this debugging exercise is that it would have been really useful to be able to quickly examine the sketches in the hll_segment.zip that @scrawfor posted in this issue.
As a result, I have developed a small tool that takes the output of the dump_segment_tool , and extracts the sketches as binary files. This allows us to easily examine the details of individual sketches with methods already available in the DataSketches library.
Hopefully, this will make debugging issues involving sketches in Druid much easier and faster.
The question is where should we put this tool so others can use it? Obviously it makes assumptions about Druid’s segment structure and Druid’s Dump-Segment tool. It doesn’t make sense to put it in the DataSketches library as it is specific to Druid. I’d be glad to submit a PR and add it to druid/services/src/main/java/org/apache/druid/cli directory. Or perhaps it should be added to the druid/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches folder.
Please advise.
Lee.
Yes. But I would wait for the full release. See https://lists.apache.org/thread.html/r6a69c6689ec303fc2df83ea483b87166b0b1d14422a2803b881b87ef%40<dev.datasketches.apache.org>
On Wed, Apr 29, 2020 at 9:15 AM Suneet Saldanha notifications@github.com wrote: