elasticsearch-hadoop: "Cannot find node with id" exception even when the node is alive and cluster is green.
I am getting the following exception when pushing data from hadoop M/R job. When this happens, the node in question is responding and cluster is also healthy (green). Also, plenty of resources on the box. CPU usage is less than 30%, free memory is over 50G. With this exception, the hadoop map task is failing and getting restarted and eventually succeeding (may be by connecting to a different ES node). These errors are not consistent. They are very intermittent.
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find node with id [Q4pQkOIJSSi2oXRXGUVs8w]
at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:251)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:218)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:201)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:159)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:227)
at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Comments: 29 (13 by maintainers)
Commits related to this issue
- [SPARK] Support for SparkSQL/SchemaRDD Relates #243 — committed to elastic/elasticsearch-hadoop by costin 10 years ago
- [REST] Add retries to cope with volatile cluster state Additionally improve logging for better diagnostics relates #243 — committed to elastic/elasticsearch-hadoop by costin 9 years ago
- [REST] Add retries to cope with volatile cluster state Additionally improve logging for better diagnostics relates #243 (cherry picked from commit ec6b471218e73edc2b8ab5f964094d1c118ec32a) — committed to elastic/elasticsearch-hadoop by costin 9 years ago
Nothing of interest was showing up in the Elasticsearch master log. I didn’t check the logs on the nodes that reported the error.
I’ll have to double check Friday when I’m back at work, but I believe I upgraded from 7u51 to 8u20 (maybe 8u25).
The exception was consistent on repeat runs of the job with the same data. Shutting down the E-search nodes that were failing to connect appeared to resolve the problem.
I’ll turn up the logging on Friday and report back.
On Wed, Nov 26, 2014 at 4:23 PM, Costin Leau notifications@github.com wrote:
I’m having the same issue on a 20 node Elasticsearch cluster. It seems to have started after I updated my Elasticsearch cluster from JDK 1.7 to JDK 1.8. When I run a load job via Elasticsearch-Spark, several ‘Cannot find node with id …’ errors occur. The same nodes report problems on repeat runs of the same job. If I go ahead and shut those few nodes down and run the job again, it seems to run error free. If I restart the entire cluster, the spark job complains about different nodes.