druid: Exceptions in broker: Channel disconnected
Hi there,
Some queries(dozens every day) fails in my production environment. I found below exceptions from the broker logs. It seems that the connection between broker and peon gets disconnected during querying.
2018-02-03 09:41:01,115 ERROR io.druid.server.QueryResource: Exception handling request: {class=io.druid.server.QueryResource, exceptionType=class io.druid.java.util.common.RE, exceptionMessage=Failure getting results for query[7ea45567-4826-4987-adf6-5f77ee1e9e47] url[http://xf5x243:8100/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected], exception=io.druid.java.util.common.RE: Failure getting results for query[7ea45567-4826-4987-adf6-5f77ee1e9e47] url[http://xf5x243:8100/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected], query=GroupByQuery ... ...
Without any retry mechanism, the query just fails. My proposal is to retry in my client application. Besides, could anyone help me to understand what happens in the server side and any configuration changes I can do?
The druid version is 0.10.1 and tranquility kafka is adopted for real-time ingestion.
Best regards, Ryan
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (3 by maintainers)
For anyone else that comes across this. Increasing
druid.processing.numThreads
on both my broker and historical solved this problem but I also needed to grant some additional headspace in-XX:MaxDirectMemorySize
(about 10% more than what the calculation would suggest).I am now suspecting an issue where the direct memory buffer gets filled and kills the thread unexpectedly. I don’t have any evidence of this other than the “channel disconnected” errors went away when I added more MaxDirectMemorySize.
I have the same issue when using imply–2.8.20, My solution is increase the druid.processing.numThreads ,the path is imply-2.8.20/conf/druid/historical/runtime.properties. the occurrence of this error is t0o many queries in one time, and not enough threads to work at this time.