metrics: Graphite reporter does not detect connections in CLOSE_WAIT state
I have an environment with two Graphite carbon-relays behind an ELB. The metrics reporter connects to the ELB which proxies to the relays. This mostly works except that the IP address of the ELB changes periodically. When this happens, the ELB sends a FIN and the connection should be closed. At that point the client needs to do another DNS lookup to find the new IP address. This works fine with other clients (for example, collectd) but not with v3.1.2 of the metrics reporter. Connections remain in CLOSE_WAIT.
To wit:
$ sudo netstat -pan | grep 2003 | awk '{print $6,$7}'
CLOSE_WAIT 14884/java
ESTABLISHED 8077/collectd
This is several hours after the IP address has changed. Note that collectd has picked up the new ELB IP address but the java process hasn’t. The related stack trace:
[WARN ] 2016-02-11 20:08:02.982 [metrics-graphite-reporter-1-thread-1] GraphiteReporter - Unable to report to Graphite
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[?:1.8.0_51]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[?:1.8.0_51]
at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[?:1.8.0_51]
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[?:1.8.0_51]
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) ~[?:1.8.0_51]
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) ~[?:1.8.0_51]
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) ~[?:1.8.0_51]
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) ~[?:1.8.0_51]
at java.io.BufferedWriter.flush(BufferedWriter.java:254) ~[?:1.8.0_51]
at com.codahale.metrics.graphite.Graphite.flush(Graphite.java:151) ~[metrics-graphite-3.1.2.jar!/:3.1.2]
at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:190) [metrics-graphite-3.1.2.jar!/:3.1.2]
at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) [metrics-core-3.1.2.jar!/:3.1.2]
at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) [metrics-core-3.1.2.jar!/:3.1.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_51]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]
I think the problem lies in isConnected() in Graphite.java.
public boolean isConnected() {
return socket != null && socket.isConnected() && !socket.isClosed();
}
socket.isConnected() and socket.isClosed() is not sufficient for checking whether the remote end has closed the connection. Instead, it needs a read(), which will return -1 if the connection is no longer open.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (2 by maintainers)
I guess #899 should fix that. Now the client opens a new socket every times it sends a new portion of metrics and closes the socket after it’s done. The fix was merged to the 3.1 branch.