haystack: Changing Elastic Search Docstore refresh_type=false with Embedding Retriever, resulting in error during query time

Describe the bug Unable to run query_by_embedding after indexing the documents with Elastic Search Docstore refresh_type=false with Embedding Retriever.

Error message A document doesn’t have a value for a vector field.

{"error":{"root_cause":[{"type":"script_exception","reason":"runtime error","script_stack":["org.elasticsearch.xpack.vectors.query.ScoreScriptUtils$DenseVectorFunction.getEncodedVector(ScoreScriptUtils.java:100)","org.elasticsearch.xpack.vectors.query.ScoreScriptUtils$CosineSimilarity.cosineSimilarity(ScoreScriptUtils.java:179)","cosineSimilarity(params.query_vector,'embedding') + 1000"," ^---- HERE"],"script":"cosineSimilarity(params.query_vector,'embedding') + 1000","lang":"painless","position":{"offset":37,"start":0,"end":56}}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"1july_test4","node":"j7lOBAVgT625FixOyow97Q","reason":{"type":"script_exception","reason":"runtime error","script_stack":["org.elasticsearch.xpack.vectors.query.ScoreScriptUtils$DenseVectorFunction.getEncodedVector(ScoreScriptUtils.java:100)","org.elasticsearch.xpack.vectors.query.ScoreScriptUtils$CosineSimilarity.cosineSimilarity(ScoreScriptUtils.java:179)","cosineSimilarity(params.query_vector,'embedding') + 1000"," ^---- HERE"],"script":"cosineSimilarity(params.query_vector,'embedding') + 1000","lang":"painless","position":{"offset":37,"start":0,"end":56},"caused_by":{"type":"illegal_argument_exception","reason":"A document doesn't have a value for a vector field!"}}}]},"status":400}

raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (12 by maintainers)

Most upvoted comments

Thanks @tholor. Adding 10sec delay between write_documents() and update_embeddings() has made it work as expected for 1k document.

Hey, when i tried i was able to make it work. In the above case it looks like embedding field was not indexed in elastic. Was update embedding called ? When I tried with independent colab i was able to make it work with refresh_type=‘False’.