cloudwatch_exporter: Scraping stops working from time to time

We have deployed the last version of the cloudwatch-exporter. We noticed that it stops getting logs from AWS sometimes and never recovers, having to restart it to fix it. What could be the cause? Maybe the size of the response? I included some logs below:

WARNING: CloudWatch scrape failed
Message: Read timed out). Response Code: 200, Response Text: OK
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1525)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:965)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:684)
	at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:188)
	at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:329)
	at java.util.Collections.list(Collections.java:3688)
	at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:40)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:365)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
	at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
	at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:220)
com.amazonaws.SdkClientException: Unable to unmarshall response (ParseError at [row,col]:[1039,14]
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:30)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
	... 41 more
Sep 24, 2017 1:45:24 AM io.prometheus.cloudwatch.CloudWatchCollector collect
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:941)
	at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:410)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:143)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:158)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:128)
	at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1039,14]
Message: Read timed out
	at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:275)
	at com.amazonaws.services.cloudwatch.model.transform.DimensionStaxUnmarshaller.unmarshall(DimensionStaxUnmarshaller.java:40)
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:54)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:43)
	at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

@brian-brazil I’m sorry, I thought we are talking about IT. “Somewhere else” looks like “works on my computer” in this situation. I described you my situation, symptoms, temporary solution. Here is changes I made to exporter (some of them is overkill I know). This fixed cloudwatch_exporter’s non-replying behavior and allowed to trigger error earlier. What evidence what do you need?

Hi @ivanfoo ! Can you share your findings? Facing quite the same issue since recently. Configuration hasn’t been changed on our end.