elasticsearch_exporter: failed to decode cluster health
I am getting these warnings from the exporter when running against an elasticsearch cluster at version 5.4.1
with the prebuilt exporter 1.0.0-rc1.linux-amd64:
level=warn msg="failed to decode cluster health" err="json: cannot unmarshal number into Go struct field clusterHealthResponse.status of type string"
All of my configuration appears to be correct for the exporter. The elasticsearch cluster itself is healthy and in active use. I can query for the /_cluster/health
manually using curl, and it returns the following:
{
"active_primary_shards": 141,
"active_shards": 348,
"active_shards_percent_as_number": 100.0,
"cluster_name": "<cluster_name>",
"delayed_unassigned_shards": 0,
"initializing_shards": 0,
"number_of_data_nodes": 3,
"number_of_in_flight_fetch": 0,
"number_of_nodes": 4,
"number_of_pending_tasks": 0,
"relocating_shards": 0,
"status": "green",
"task_max_waiting_in_queue_millis": 0,
"timed_out": false,
"unassigned_shards": 0
}
Any ideas as to what the issue may be?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 17 (15 by maintainers)
Hi, we weren’t able to reproduce the issue. Our ES /_cluster/health json looks similar to yours and the linux as well as the darwin binary ran fine. Can you try to reproduce the bug with a self build binary?
nvm, my tests were flawed. Now they run fine even with basic auth.
I’ve added some tests against the golden output of different ES versions (1.7, 2.4, 5.4) with and without basic auth.
The basic auth tests are failing right now, so either the tests are broken or (IMHO more likely) our handling of basic auth isn’t working anymore. @metalmatze
Thanks for reporting this. That’s why we issued an rc1. Glad you tried it out.
Your output looks like the Basic Auth failed, that could possibly be an regression, but the code doing the actual request is (almost) the same as before.
We need to further investigate this. If you could try against an ES w/o basic auth that could help us narrow down this issue.
TLS setup hasn’t changed, just been refactored. What has changed is the way we initialize our http client. It was
the way of setting up timeout is deprecated here, so we changed to
All of it shouldn’t make a difference regarding basic auth.
To reproduce: can you please comment how you called the exporter and how you queried the _cluster endpoint via
curl
?We’ll have a look. @metalmatze can you have a look too?!