k-NN: [BUG] opensearch process exits during addition of documents with knn_vector field

Describe the bug While adding ~1.3M documents with a knn_vector type field, the opensearch process exited with KNN related errors. This happened after adding approximately 800K documents. The document mapping and index info are below. I was using the latest docker image (version data below), with a single mode.

Errors in log (there were tens of these error pairs before the process exited)

[2021-07-30T18:36:47,770][WARN ][o.o.t.ThreadPool         ] [opensearch-node1] failed to run scheduled task [org.opensearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker@4027ffab] on thread pool [same]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:877) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:891) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter.getFlushingBytes(IndexWriter.java:781) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.opensearch.index.engine.InternalEngine.getWritingBytes(InternalEngine.java:635) ~[opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.index.shard.IndexShard.getWritingBytes(IndexShard.java:991) ~[opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.indices.IndexingMemoryController.getShardWritingBytes(IndexingMemoryController.java:195) ~[opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker.runUnlocked(IndexingMemoryController.java:323) ~[opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:303) ~[opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:226) [opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:756) [opensearch-1.0.0.jar:1.0.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) [opensearch-1.0.0.jar:1.0.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.lang.RuntimeException: java.lang.Exception: Check failed: it's either a bug or inconsistent data!
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.merge(KNN80DocValuesConsumer.java:165) ~[?:?]
	at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:113) ~[opensearch-1.0.0.jar:1.0.0]
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
Caused by: java.lang.Exception: Check failed: it's either a bug or inconsistent data!
	at org.opensearch.knn.index.v2011.KNNIndex.saveIndex(Native Method) ~[?:?]
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer$1.run(KNN80DocValuesConsumer.java:117) ~[?:?]
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer$1.run(KNN80DocValuesConsumer.java:115) ~[?:?]
	at java.security.AccessController.doPrivileged(AccessController.java:312) ~[?:?]
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addKNNBinaryField(KNN80DocValuesConsumer.java:114) ~[?:?]
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.merge(KNN80DocValuesConsumer.java:161) ~[?:?]
	at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
	at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:113) ~[opensearch-1.0.0.jar:1.0.0]
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
Killing performance analyzer process 11
OpenSearch exited with code 137
Performance analyzer exited with code 143
./opensearch-docker-entrypoint.sh: line 109:    10 Killed                  $OPENSEARCH_HOME/bin/opensearch "${opensearch_opts[@]}

Mapping info

{
  "test_knn" : {
    "aliases" : { },
    "mappings" : {
      "dynamic" : "true",
      "properties" : {
        "end" : {
          "type" : "integer"
        },
        "id" : {
          "type" : "keyword"
        },
        "index" : {
          "type" : "keyword"
        },
        "start" : {
          "type" : "integer"
        },
        "text" : {
          "type" : "keyword"
        },
        "type" : {
          "type" : "keyword"
        },
        "vector" : {
          "type" : "knn_vector",
          "dimension" : 768,
          "method" : {
            "engine" : "nmslib",
            "space_type" : "innerproduct",
            "name" : "hnsw",
            "parameters" : {
              "ef_construction" : 512,
              "m" : 24
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "knn.algo_param" : {
          "ef_search" : "100"
        },
        "provided_name" : "test_knn",
        "knn" : "true",
        "creation_date" : "1627660076506",
        "number_of_replicas" : "0",
        "uuid" : "FZj47j44TUyarSBU3ZjMaw",
        "version" : {
          "created" : "135217827"
        }
      }
    }
  }
}

Version info

{
  "name" : "opensearch-node1",
  "cluster_name" : "opensearch-cluster",
  "cluster_uuid" : "IeOOVXJoRACXnucMfKNzqQ",
  "version" : {
    "distribution" : "opensearch",
    "number" : "1.0.0",
    "build_type" : "tar",
    "build_hash" : "34550c5b17124ddc59458ef774f6b43a086522e3",
    "build_date" : "2021-07-02T23:22:21.383695Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

To Reproduce Unclear. This happens sporadically during loading large number of documents. Documents are being added via the Python bulk update API (which appears to hit the server with 500 documents at a time).

Expected behavior Opensearch process doesn’t exit

Plugins opensearch-node1 opensearch-alerting 1.0.0.0 opensearch-node1 opensearch-anomaly-detection 1.0.0.0 opensearch-node1 opensearch-asynchronous-search 1.0.0.0 opensearch-node1 opensearch-index-management 1.0.0.0 opensearch-node1 opensearch-job-scheduler 1.0.0.0 opensearch-node1 opensearch-knn 1.0.0.0 opensearch-node1 opensearch-notebooks 1.0.0.0 opensearch-node1 opensearch-performance-analyzer 1.0.0.0 opensearch-node1 opensearch-reports-scheduler 1.0.0.0 opensearch-node1 opensearch-security 1.0.0.0 opensearch-node1 opensearch-sql 1.0.0.0

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: latest docker image from here https://www.opensearch.org/downloads.htm
  • Host is Centos 7.9
  • Version 1.0.0
  • Docker-compose file edited for only one node
  • OPENSEARCH_JAVA_OPTS=-Xms8g -Xmx8g"
  • those were the only changes ‘before docker-compose up’

Additional context AWS m5.xlarge host machine

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (13 by maintainers)

Most upvoted comments

Yes, PCA can be very helpful to reduce memory footprint.

Updating thread with my findings

After running several different tests and inspecting a lot of code, I believe that the rise in RSS is not due to a memory leak. I isolated the C++ code into a simple binary and ran valgrind on it and confirmed there is no leak. I also checked to make sure that the jni function isn’t interrupted when memory is being allocated by putting a unique pair of log statements around the saveIndex JNI call. I also attempted to use jemalloc’s jeprof profiler, but did not find anything. Lastly, I tried to run OpenSearch with Valgrind while suppressing calls coming from Java. This did not work well. OpenSearch ran way too slow and randomly crashed.

Along with this, I checked and confirmed this behavior is present for ODFE 1.13, 1.12, 1.11 (and I stopped checking older versions there).

That being said, if it is not a memory leak, it appears that some kind of fragmentation is causing the increase. I think that our memory access pattern is causing malloc to keep asking the OS for more memory instead of reusing free memory from its own free memory pools. When free is called on malloc’d data, the memory is not returned to the Operating System; instead it is returned to malloc’s free memory pool to be used in the future.

I haven’t been able to confirm this is the problem definitively, but I found two things from my experiment that would indicate this is the issue:

  1. Using jemalloc as the malloc implementation, I noticed the RSS grew to less than 10 GB - significantly lower than with glibc’s implementation (12.7 GB). I found in a couple posts (one from LinkedIn) that jemalloc more aggressively returns memory to the OS
  2. Adding malloc_trim to the end of our saveIndex function resulted in a similar decrease. malloc_trim attempts to release free memory from the heap to the OS.

In several places in the saveIndex code, we allocate a lot of small objects and a few large ones, which may be causing the issue. However, this does not explain a few remaining questions:

  1. For allocations over 128 KB, malloc should be using mmap and free will then use munmap which should immediately return the memory to the OS
  2. The memory with the jemalloc code and the malloc_trim code still goes from 8.8 GB at the beginning of the experiment to ~9.8 GB at the end, increasing memory by 1 GB.

From here, I think there are a few action items:

  • Use gdb to debug what malloc’s state is before and after the indexing workload. Here, we could identify how much free memory each arena has by calling malloc_stats. This could confirm that fragmentation is causing this behavior
  • Try out different allocation patterns in the JNI to get a better picture of what is causing the fragmentation. For instance, allocating a large array here instead of allocating individual objects.
  • Improve our memory error handling. I think we could do a better job of alerting user that failure in jni is caused by memory pressure and give more detail about the failure.
  • Provide guidance on memory required for indexing.

Some resources I have referenced:

  1. https://www.linuxjournal.com/article/6390
  2. https://www.algolia.com/blog/engineering/when-allocators-are-hoarding-your-precious-memory/
  3. https://stackoverflow.com/questions/39753265/malloc-is-using-10x-the-amount-of-memory-necessary

Hi @billschell I was able to reproduce with m5.xlarge instance with your configuration. It does seem to be memory related. Will look into possible causes today.

Upon further testing, I noticed after indexing had finished (without forcemerging and refreshing on each bulk index) and no merges were going on in the background, docker stats showed that 12.79 out of 15.22 GB were in use. Given cache’s max weight is 4 GB, there is not enough room to actually load the graphs into memory. But, because the cache’s current weight says there should be room (i.e. it hasn’t hit capacity), it attempts to anyways causing either a malloc that returns null or OOM Killer to kill the process.

At the beginning of the process, before any operations are performed, docker stats shows that 8.787 GB of memory is being used. The question is, what is consuming 4 GB of memory after indexing. I confirmed no background merges were taking place. Also, I confirmed process crashes after deleting the index and re-running the same indexing workload.

Need to investigate if there is a leak happening somewhere on the indexing path.