elasticsearch-hadoop: Dots in field names exception
spark-1.6.2-bin-hadoop2.6, elasticsearch-5.0.0-beta1, elasticsearch-hadoop-5.0.0-beta1
curl -XPOST localhost:9200/test4/test -d '{"b":0,"e":{"f.g":"hello"}}'
./bin/pyspark --driver-class-path=../elasticsearch-hadoop-5.0.0-beta1/dist/elasticsearch-hadoop-5.0.0-beta1.jar
>>> df1 = sqlContext.read.format("org.elasticsearch.spark.sql").load("test4/test")
>>> df1.printSchema()
root
|-- b: long (nullable = true)
|-- e: struct (nullable = true)
| |-- f: struct (nullable = true)
| | |-- g: string (nullable = true)
>>> df1.show()
---8<--- snip ---8<---
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'e.f.g' not found in row; typically this is caused by a mapping inconsistency
at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:45)
at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:14)
at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:94)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:696)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:696)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:466)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:391)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:286)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:259)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:365)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 8
- Comments: 21 (8 by maintainers)
Commits related to this issue
- Created a unit test, that MetaModel is unable to work with a document indexed by Elasticsearch which contains dots in its fieldnames. Note that this is actually caused by Elasticsearch, because the ma... — committed to arjansh/metamodel by deleted user 4 years ago
- Documenting that we do not support dots in field names (#1900) Es-hadoop does not support fields with dots in their names (#853). Adding support is likely to cause more problems than it fixes. So th... — committed to elastic/elasticsearch-hadoop by masseyke 2 years ago
I am going to go ahead and re-open this since it seems like this “problem” of dots in field names is less of a “problem” and more just where things are trending toward in the data integration space. It would be unwise of us to ignore this issue given recent developments across existing solutions.
That said, this issue is not an easy fix and requires some adjusting of invariants that we have treated very carefully over the years - most notably that
_source
is sacred and should only be changed judiciously. Additionally, document update logic likely will need looking at (just try running a partial document update using normalized JSON in the request against a document containing dotted field names).