fscrawler: ES crashes while indexing large file in fscrawler

Hi, request a little help on the following issue:

I am trying to index a text file (size 87 MB and greater) using Fscrawler. The crawler successfully reads the file and i can see it in the output from fscrawler. However, at the end of parsing the file, fscrawler ends by forcibly closing ES. If i reduce the size of the file (40-70 MB), the behaviour becomes erratic with closing of ES sometimes and other times ES just hangs. Moreover, the Crawler sometimes is creating _status.json file and sometimes it doesn’t. 2. If the size of file is further reduced to <20 MB, everything becomes OK. 3. I am not sure whether it is an error related to ES or Fscrawler. Hence posting it here. 4. I am using a windows machine with FS 2.5 and ES 6.3.2
5. I am putting the log files of ES and FS 6. My settings are

"elasticsearch"` : {
    "nodes" : [ {
      "host" : "127.0.0.1",
      "port" : 9200,
      "scheme" : "HTTP"
    } ],
    "bulk_size" : 50,
    "flush_interval" : "5s",
    "byte_size" : "5mb"
  },
  "rest" : {
    "scheme" : "HTTP",
    "host" : "127.0.0.1",
    "port" : 8080,
    "endpoint" : "fscrawler"
  }
}
  1. The Log output of FScrawler is
19:34:24,162 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [test]
19:34:24,163 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
19:34:24,165 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
19:34:29,842 WARN  [f.p.e.c.f.c.ElasticsearchClientManager] **Got a hard failure when executing the bulk request**
java.io.IOException: **An existing connection was forcibly closed by the remote host**
	at XX.nio.ch.SocketDispatcher.read0(Native Method) ~[?:1.8.0_191]
	at XX.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[?:1.8.0_191]
	at XX.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191]
	at XX.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_191]
	at XX.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191]
	at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:204) ~[httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136) ~[httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) [httpasyncclient-4.1.2.jar:4.1.2]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.5.jar:4.4.5]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [httpcore-nio-4.4.5.jar:4.4.5]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
19:34:29,852 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
19:34:29,856 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
19:34:29,857 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [test] stopped
19:34:29,859 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [test]
19:34:29,860 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
19:34:29,860 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
19:34:29,860 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
19:34:29,860 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
19:34:29,860 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [test] stopped
  1. ES Log files
[2019-06-09T19:34:25,096][INFO ][o.e.m.j.JvmGcMonitorService] [oh9vXnd] [gc][301] overhead, spent [448ms] collecting in the last [1s]
[2019-06-09T19:34:26,632][WARN ][o.e.m.j.JvmGcMonitorService] [oh9vXnd] [gc][302] overhead, spent [956ms] collecting in the last [1.5s]
[2019-06-09T19:34:29,392][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] **fatal error in thread [elasticsearch[oh9vXnd][write][T#1]], exiting**
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[?:1.8.0_191]
	at java.lang.String.<init>(String.java:207) ~[?:1.8.0_191]
	at java.lang.StringBuilder.toString(StringBuilder.java:407) ~[?:1.8.0_191]
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356) ~[jackson-core-2.8.10.jar:2.8.10]
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2470) ~[jackson-core-2.8.10.jar:2.8.10]
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315) ~[jackson-core-2.8.10.jar:2.8.10]
	at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:84) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
	at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:269) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.TextFieldMapper.parseCreateField(TextFieldMapper.java:564) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:481) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:603) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:403) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:380) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:95) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:69) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:261) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:708) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:685) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:666) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$2(TransportShardBulkAction.java:553) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$2668/721638591.get(Unknown Source) ~[?:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:572) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:551) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequest(TransportShardBulkAction.java:142) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:248) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:125) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:112) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:74) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1018) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:996) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103) ~[elasticsearch-6.3.2.jar:6.3.2]

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

will have to upgrade to new ES version to solve it!