elasticsearch-rails: How Could I Remove Transport::Errors::Conflict

Hi, I have a problem, I use elasticsearch-model like this

include Elasticsearch::Model
include Elasticsearch::Model::Callbacks

When update concurrency in production model I got the error

Elasticsearch::Transport::Transport::Errors::Conflict

like this

[409] {"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[swipe_configs][f55a8333-ee71-4473-b575-7d63b89de8a0]: version conflict, current [2], provided [1]","shard":"1","index":"inchat_release"}],"type":"version_conflict_engine_exception","reason":"[swipe_configs][f55a8333-ee71-4473-b575-7d63b89de8a0]: version conflict, current [2], provided [1]","shard":"1","index":"inchat_release"},"status":409}

How could I avoid this error?

thank you

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

Update: After hours of investigation it seems the error related to the usage of delete_by_query. I still don’t know why but here is the problematic line:

Elasticsearch::Persistence.client.perform_request 'POST', "#{index_name}/_delete_by_query", {}, builder.query

As I said in my previous post, this error appears randomly which makes it very difficult to fix. So if anyone have an idea… Thanks for the help.

How an object “maked as deleted” could have been updated (…)

I’m not 100% sure, but the subtlety in the API description is that the operation “gets a snapshot of the index”, hence, it “freezes” that index in time, including the versions, and starts processing the documents. (This is exactly the same thing as with the “Scroll” API.) But the index is kept “live”, of course, and can happily receive changes, and any change increments the internal version – in your example, the error says current version [2] is different than the one provided [1].

So, I don’t think I was correct in the point “3” of my description, I don’t know 100% what happens at this point, but if the index is connected eg. to a database, an ID can indeed be “reused”…

To debug it further, can you get from the logs if the “conflict” error happens for the “Update Document” operation or the “Delete By Query” operation? My guess is the former, but wanna be sure. In that case, for further debugging, I’d probably try to match the document ID for that operation with documents matching the “Delete By Query” query. I guess it’s probably challening to do it on a live system in flux, but you can eg. issue a regular “Search” request before you issue the “Delete By Query” one, and dump the matching IDs somewhere, so you can later on match them…

Sorry for not having a straightforward explanation here… One last thing to double-check is whether the “Update Document” operation (from your callbacks) does or doesn’t include the version (it shouldn’t by default).

@karmi thanks for those suggestions. Since I’m only able to reproduce this case on my production environment (related to the amount of data I have in production), you can imagine it’s very complicated to investigate through the logs. But what I did is extract the code I wrote to only run the _delete_by_query method without any other actions. In this case I had the exact same problem. I don’t know exactly the behaviour of the code behind the _delete_by_query and what actions are triggered by this method, but those 409 conflits errors seems only related to this method usage.

After couple of tries here is “monkey patched” that worked in my code:

def delete_by_query(index_name, query)
  tries ||= 3
  Elasticsearch::Persistence.client.delete_by_query(index: index_name, body: query)
rescue Elasticsearch::Transport::Transport::Errors::Conflict => e
  raise e if (tries -= 1).zero?
  log_error('version conflict')
  sleep 30
  retry
end

This is clearly not a permanent solution but that the only way I found to make it work without redesign the entire code. I don’t know if that helps but that’s the point where I stopped for now.

Interesting, I’ continue to investigate. FYI I tried two different way to manage the _id value. First, I didn’t specified any _id for the object creation, so I let ES set his own _id value. After experiencing this 409 conflicts error I tried a different approach by manually specifying the _id value by doing this: _id: SecureRandom.uuid. Unfortunately, this results to the same issue. I was very surprised this error happened in my second scenario because the chances SecureRandom.uuid regenerates the exact same id are extremely low. I don’t know if it helps but I found important that you know this information before going further.