elasticsearch-hadoop: EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL

Create an index type with a mapping consisting of a field of type geo_shape.
Create an RDD[String] containing a polygon as GeoJSON, as the value of a field whose name matches the mapping: """{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""
Write to an index type in Elasticsearch: rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)
Read into SparkSQL DataFrame with either esDF or read-format-load:
- sqlContext.esDF(indexName+"/"+indexType, connectorConfig)
- sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)

Result is: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field ‘rect’ not found; typically this occurs with arrays which are not mapped as single value Full stack trace in gist. Elasticsearch Hadoop v2.1.2

About this issue

Original URL
State: closed
Created 9 years ago
Comments: 46 (25 by maintainers)

Commits related to this issue

Overhaul for Geo types Add dedicated parsing and handling of Geo types and inferring of data based on 'sampling' of data. As Geo types are not properly described into their mappings (ES provides only... — committed to elastic/elasticsearch-hadoop by costin 8 years ago
Overhaul for Geo types Add dedicated parsing and handling of Geo types and inferring of data based on 'sampling' of data. As Geo types are not properly described into their mappings (ES provides only... — committed to elastic/elasticsearch-hadoop by costin 8 years ago
[REST] Sampling query should be compatible with ES 1.7 [SPARK] Perform ES discovery before mapping discovery relates #607 — committed to elastic/elasticsearch-hadoop by costin 8 years ago
[GEO] shape names can be mixed case relates #607 — committed to elastic/elasticsearch-hadoop by costin 8 years ago
[SPARK] Restore compatibility with Spark 1.3/1.4 relates #607 — committed to elastic/elasticsearch-hadoop by costin 8 years ago
[SPARK] Ignore extra fields when parsing geo types ES allows extra fields to be specified for geo types. relates #607 — committed to elastic/elasticsearch-hadoop by costin 8 years ago

Most upvoted comments

Hi @costin , through spark java we are also facing issues while pushing geo-shape to elastic search index . It is giving error message - Failed to parse

Bomb281993 on Oct 31, 2018

@randallwhitman Hi,

I’ve taken a closer look at this and it’s a bit more complicated. Fixable but it’s not as easy as I thought. The major issue with SparkSQL is that it requires a strict schema before loading any data so the connector can only rely on the mapping to provide it. However the underlying data (due to the flexibility of JSON) can be quite… loose which trips Spark and/or the connector as it doesn’t fit exactly into the schema.

First off, field “crs” is null meaning it is not mapped - there’s no type information associated with it and thus, no mapping. So the connector doesn’t even see it when looking at the mapping so when it encounters it in the _source, it doesn’t know what to do with it. This needs to be fixed - currently I’ve added a better exception message and raised #648 Second, the mapping information is incomplete for Spark SQL requirements. For example coordinates is a field of type long. Is it a primitive or an array? We don’t know before hand. One can indicate that it’s an array through the newly introduced es.read.field.as.array.include/exclude (ES 2.2 only). However this is not enough, as the array depth is unknown. The connector is told that this field is an array but is it [long], [[long]], [[[long]]] and so on? I’ve raised yet another issue for this, namely #650.

costin on Jan 8, 2016