jmxtrans: High CPU usage when using version 254

Hello,

I’ve been using JMXTrans to transfer some 301 metrics every minute. At the start up everything works fine, CPU usage is about 15%. Then after a while CPU usage starts to climb progressively until reaching 95% constantly. I checked with mission control the threads that are demanding CPU and found some Stormpot-pool1-thread-XX on the top of the list.

jmxtrans_threads_consommant_cpu_1

I could correlate the CPU increase with exception of this type:

java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
        at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
        at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
        at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
        at java.io.BufferedWriter.write(BufferedWriter.java:230)
        at java.io.Writer.write(Writer.java:157)
        at com.googlecode.jmxtrans.model.output.GraphiteWriter2.write(GraphiteWriter2.java:78)
        at com.googlecode.jmxtrans.model.output.support.WriterPoolOutputWriter.doWrite(WriterPoolOutputWriter.java:54)
        at com.googlecode.jmxtrans.model.output.support.ResultTransformerOutputWriter.doWrite(ResultTransformerOutputWriter.java:61)
        at com.googlecode.jmxtrans.jmx.ResultProcessor$1.run(ResultProcessor.java:58)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I assumed according to these facts that the high CPU usage is caused by the Stormpot writer pools. I tried to change the code and include an invalidate statement on the writer that raised this exception as follows (class com.googlecode.jmxtrans.model.output.support.WriterPoolOutputWriter):

 public void doWrite(Server server, Query query, Iterable<Result> results)
    throws Exception
  {
    try
    {
      WriterPoolable writerPoolable = (WriterPoolable)this.writerPool.claim(this.poolClaimTimeout);
      try
      {
        this.target.write(writerPoolable.getWriter(), server, query, results);
      }
      catch (Exception ioe)
      {
        writerPoolable.invalidate();
        this.logger.warn("Invalidate WriterPoolable", ioe);
        throw ioe;
      }
      finally
      {
        writerPoolable.release();
      }
    }
    catch (InterruptedException e)
    {
      throw new IllegalStateException("Could not get writer from pool, please check if the server is available");
    }
  }

This change didn’t help as I reproduced the CPU high usage.

According to the Stormpot documentation, the expiration takes effect only after the release of the pooled object and it has to be delegated to Stormpot itself by defining an Expiration strategy.

I could not test more code changes as I got some other priority tasks to do. But I still need to fix this issue in order to use JMXTrans for monitoring. Is it possible to check this issue?

As a second question: is there a way not to use poolable writer as the pool contains always 1 instance and the writers are instantiated once per request as shows the class histogram bellow jmxtrans_nb_instances

Thanks in advance for your help and your support.

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Comments: 54 (38 by maintainers)

Commits related to this issue

Most upvoted comments

Precise sleeping - your wait/notify suggestion - is too complicated and risky to tackle in a 2.4.1 release. And I’m redoing the background task scheduling in 3.0 anyway.

I found 2 other cases where it’s using more CPU than it strictly needs. I want to fix those as well.

Thanks a lot for that PR! I had in mind something closer to a circuit breaker, but your implementation with the retryer seems just as good.

And send me an email if you want in. JmxTrans is looking for talents!

If I read that correctly, you only return the servers for the last file that is read. It’s late here, so my eyes are probably crossed. If you have a PR and unit tests showing that it works, please send them! In any case, I’m always thrilled to receive PRs… (btw, you seem to invest a fair amount of time in jmxtrans, send me an email if you want to join the team).

Actually, you’re right, those unneeded pools are not shutdown and not garbage collected.

Seems like a cheap fix would be to change ConfigurationParser#parseServers to do away with the ServerListBuilder. I tested the following snippet locally via a test configuration file as well as the int/unit test:

    public ImmutableList parseServers(Iterable<File> jsonFiles, boolean continueOnJsonError) throws LifecycleException {
        ImmutableList<Server> servers = ImmutableList.of();
        for (File jsonFile : jsonFiles) {
            try {
                JmxProcess process = jsonUtils.parseProcess(jsonFile);
                log.debug("Loaded file: {}", jsonFile.getAbsolutePath());
                servers = ImmutableList.copyOf(process.getServers());
            } catch (Exception ex) {
                String message = "Error parsing json: " + jsonFile;
                // error parsing one file should not prevent the startup of JMXTrans
                if (continueOnJsonError) log.error(message, ex);
                else throw new LifecycleException(message, ex);
            }
        }
        return servers;
    }

Do you want me to submit a PR for that?