elasticsearch-hadoop: EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL

  1. Create an index type with a mapping consisting of a field of type geo_shape.
  2. Create an RDD[String] containing a polygon as GeoJSON, as the value of a field whose name matches the mapping: """{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""
  3. Write to an index type in Elasticsearch: rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)
  4. Read into SparkSQL DataFrame with either esDF or read-format-load:
    • sqlContext.esDF(indexName+"/"+indexType, connectorConfig)
    • sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)

Result is: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field ‘rect’ not found; typically this occurs with arrays which are not mapped as single value Full stack trace in gist. Elasticsearch Hadoop v2.1.2

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 46 (25 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @costin , through spark java we are also facing issues while pushing geo-shape to elastic search index . It is giving error message - Failed to parse

@randallwhitman Hi,

I’ve taken a closer look at this and it’s a bit more complicated. Fixable but it’s not as easy as I thought. The major issue with SparkSQL is that it requires a strict schema before loading any data so the connector can only rely on the mapping to provide it. However the underlying data (due to the flexibility of JSON) can be quite… loose which trips Spark and/or the connector as it doesn’t fit exactly into the schema.

First off, field “crs” is null meaning it is not mapped - there’s no type information associated with it and thus, no mapping. So the connector doesn’t even see it when looking at the mapping so when it encounters it in the _source, it doesn’t know what to do with it. This needs to be fixed - currently I’ve added a better exception message and raised #648 Second, the mapping information is incomplete for Spark SQL requirements. For example coordinates is a field of type long. Is it a primitive or an array? We don’t know before hand. One can indicate that it’s an array through the newly introduced es.read.field.as.array.include/exclude (ES 2.2 only). However this is not enough, as the array depth is unknown. The connector is told that this field is an array but is it [long], [[long]], [[[long]]] and so on? I’ve raised yet another issue for this, namely #650.