elasticsearch-hadoop: "Cannot find node with id" exception even when the node is alive and cluster is green.

I am getting the following exception when pushing data from hadoop M/R job. When this happens, the node in question is responding and cluster is also healthy (green). Also, plenty of resources on the box. CPU usage is less than 30%, free memory is over 50G. With this exception, the hadoop map task is failing and getting restarted and eventually succeeding (may be by connecting to a different ES node). These errors are not consistent. They are very intermittent.

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find node with id [Q4pQkOIJSSi2oXRXGUVs8w]
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
    at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:251)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:218)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:201)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:159)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:227)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

About this issue

Original URL
State: closed
Created 10 years ago
Comments: 29 (13 by maintainers)

Commits related to this issue

[SPARK] Support for SparkSQL/SchemaRDD Relates #243 — committed to elastic/elasticsearch-hadoop by costin 10 years ago
[REST] Add retries to cope with volatile cluster state Additionally improve logging for better diagnostics relates #243 — committed to elastic/elasticsearch-hadoop by costin 9 years ago
[REST] Add retries to cope with volatile cluster state Additionally improve logging for better diagnostics relates #243 (cherry picked from commit ec6b471218e73edc2b8ab5f964094d1c118ec32a) — committed to elastic/elasticsearch-hadoop by costin 9 years ago

Most upvoted comments

Nothing of interest was showing up in the Elasticsearch master log. I didn’t check the logs on the nodes that reported the error.

I’ll have to double check Friday when I’m back at work, but I believe I upgraded from 7u51 to 8u20 (maybe 8u25).

The exception was consistent on repeat runs of the job with the same data. Shutting down the E-search nodes that were failing to connect appeared to resolve the problem.

I’ll turn up the logging on Friday and report back.

On Wed, Nov 26, 2014 at 4:23 PM, Costin Leau notifications@github.com wrote:

@ebradshaw https://github.com/ebradshaw anything showing up in the logs? What are the exact JDK versions in place (what updates)? Does the exception occurs consistently or not? Can you please turn on logging on TRACE level and fire up the job and report back? Thanks!

— Reply to this email directly or view it on GitHub https://github.com/elasticsearch/elasticsearch-hadoop/issues/243#issuecomment-64712579 .

ebradshaw on Nov 26, 2014

I’m having the same issue on a 20 node Elasticsearch cluster. It seems to have started after I updated my Elasticsearch cluster from JDK 1.7 to JDK 1.8. When I run a load job via Elasticsearch-Spark, several ‘Cannot find node with id …’ errors occur. The same nodes report problems on repeat runs of the same job. If I go ahead and shut those few nodes down and run the job again, it seems to run error free. If I restart the entire cluster, the spark job complains about different nodes.

ebradshaw on Nov 26, 2014