sentence-transformers: Memory leak in SentenceTransformer.encode during the first ~10000 predictions

The following minimal example repeatedly calls SentenceTransformer.encode on random strings of fixed length (12345) and fixed number of strings (200), and it records the memory usage.

For the first ~50 calls (~10000 predictions), the memory usage grows enormously.

# memleak.py

import random
import string

import psutil
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')


def random_string(length: int) -> str:
    return ''.join(random.choices(string.ascii_uppercase + string.digits, k=length))


print('iteration,memory_usage_in_MiB', flush=True)
for iteration in range(99999999):
    model.encode([random_string(12345) for _ in range(200)])
    memory_usage_in_MiB = psutil.Process().memory_info().rss / (1024 * 1024)
    print(f'{iteration},{memory_usage_in_MiB}', flush=True)

Output:

iteration,memory_usage_in_MiB
0,1329.22265625
1,1431.140625
2,1509.2265625
3,1641.55859375
4,1699.109375
5,1779.36328125
[...]
10,2250.69921875
[...]
20,3121.921875
[...]
30,4033.1875
[...]
40,4917.00390625
41,5006.48046875
42,5102.65625
43,5186.4453125
44,5276.37890625
45,5378.58203125
46,5486.60546875
47,5546.50390625
48,5648.64453125
49,5731.9296875
50,5749.0390625
51,5765.81640625
52,5776.52734375
53,5752.5390625
54,5752.39453125
55,5765.01953125
56,5783.08203125
57,5758.75
58,5752.390625
59,5794.265625
60,5752.83984375
61,5776.9140625
62,5764.89453125
63,5794.5703125
64,5795.8515625
65,5789.98046875
66,5795.84375
67,5783.55859375
[...]

The larger the input strings, the higher the memory usage. But it always stops at this point.

I’m using no GPU, and the behavior can be reproduced with the following Dockerfile:

FROM python:3.10.9
RUN pip install sentence-transformers==2.2.2 psutil==5.9.4

# download model
RUN python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"

# Disable the Docker cache from this stage on, see https://stackoverflow.com/a/58801213/1866775
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

ADD ./memleak.py /
RUN python /memleak.py

Is this a memory leak or intended behavior?

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 1
Comments: 21

Most upvoted comments

Interesting. I would love to avoid the memory issues with these odd edge cases as well. I remember a similar case where someone tried to do sentence segmentation on Wikipedia edits, but it would sometimes stop working - it ended up being caused by someone who edited a sequence of “aaaaaaa…” with a length of 10k, and the segmenter couldn’t handle that 😄

tomaarsen on Feb 6, 2024

Looks like this happens when you pass an array of data for encoding.

If you call model.encode many times with one element only (by using an outer loop) there’s no memory spike.

At least that’s what my tests show.

rossbg on Jul 26, 2023