jeromq: JDK epoll bug in 0.4.3

i use jeromq 0.4.3 with jdk 8. sometimes Poller.run get 100% cpu usage without any message. i add some debug info in the Poller and find that rc = selector.select(timeout); always get rc==1. so it can’t hit maybeRebuildSelector method.

netty solve the issue by only use a counter, and ignore the ‘rc’.

sorry for my poor english

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 24 (11 by maintainers)

Most upvoted comments

@fredoboulo Thanks for getting back on this. I’ve not worked on that code in a while, so I haven’t been affected by this in a while. I’ll make sure to (at least attempt to) make a small reproduction sample. (Maybe the latest JVM fixes the issue?)

I am also experiencing this bug with JDK8 on Linux. Our stack looks as follows for the thread consuming 100% cpu

2018-12-07_18:03:45.24915 "iothread-2" #21 daemon prio=5 os_prio=0 tid=0x00007f3ae0cfd800 nid=0x4a3b runnable [0x00007f3afac07000]
2018-12-07_18:03:45.24915    java.lang.Thread.State: RUNNABLE
2018-12-07_18:03:45.24915       at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
2018-12-07_18:03:45.24915       at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
2018-12-07_18:03:45.24915       at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
2018-12-07_18:03:45.24915       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
2018-12-07_18:03:45.24916       - locked <0x00000005eaef8698> (a sun.nio.ch.Util$3)
2018-12-07_18:03:45.24916       - locked <0x00000005eaef8650> (a java.util.Collections$UnmodifiableSet)
2018-12-07_18:03:45.24916       - locked <0x00000005eb156da8> (a sun.nio.ch.EPollSelectorImpl)
2018-12-07_18:03:45.24916       at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
2018-12-07_18:03:45.24916       at zmq.poll.Poller.run(Poller.java:234)
2018-12-07_18:03:45.24917       at java.lang.Thread.run(Thread.java:748)

Any ideas how we can mitigate it? We are using ZMQ to forward error/access logs to splunk. I dont see any error messages in the log files that could point to the root cause.