elasticsearch-dump: [BUG] When dumping an index to file , elasticsearch-dump misses the last batch with concurrency=2, limit=10000
Context In order to help us troubleshoot issues with this project, you must provide the following details:
- ElasticDump version:
v6.28.0
, ran with docker - Elasticsearch version:
6.4.1
- Docker command:
docker run \
--rm \
-d \
-v $WORKDIR:/export \
taskrabbit/elasticsearch-dump:${ELASTICDUMP_TAG} \
--input=http://${ES_HOST}:9200/${index} \
--output="/export/${index}.json" \
--limit 10000 \
--concurrency 2 \
--fileSize 100mb
- A data-set you can share publicly which reproduces the problem: apologies, cannot provide this atm. Will see if I can reproduce with a public dataset though, but it will take some time
Describe the bug
For additional safety our tooling double checks the number of documents written to disk by elasticsearch-dump
and the number of documents in the ES index.
Randomly, the total writes reported by elasticsearch-dump
in the Total Writes
line will fall short of the number of documents reported by ES by exactly the value used for --limit
, in our case 10000, as shown above. Retrying the dump usually fixes the problem, but this is happening enough times that it requires significant effort to look after it. Tonight’s difference: (es == dump): 2684777 == 2674777
To obtains the number of documents in the index we used the cat API: http://${ES_HOST}:9200/_cat/indices/$index
, the 7th column holds the value we’re interested in.
Note that the ES index is effectively frozen so it is not being modified while elasticsearch-dump
is dumping it.
We only perform the check if elasticsearch-dump
exists cleanly without errors.
To Reproduce
Try to dump an index to disk using the option above, then compare the es doc count and the number written by elasticsearch-dump
to its logs, at the Total Writes:
line
Current behavior
Randomly, the count reported by elasticsearch-dump
is --limit
less than in ES.
Expected behavior
The 2 numbers should always match, unless there’s an error. In our case there is not error, elasticsearch-dump
exits cleanly.
Screenshots
n.a.
Additional context
nothing to add
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 34
Alright I’ll add logic to check the body for failure and possibly retry
Yea I was just about to tell you to enable the debug flag and report back with the logs. I also added
totalSearchResults
to the debug log (this should report the # count that the application is seeing).elasticdump - v6.47.0
Just a heads up, from my investigation. There’s only 2 places there could be an issue. Either in the concurrency logic (dropping a concurrent stream) or the count received by the app.