requests: Requests memory leak

Summary.

Expected Result

Program running normally

Actual Result

Program consuming all ram till stops working

Reproduction Steps

Pseudocode:

def function():
    proxies = {
        'https': proxy
    }
    session = requests.Session()
    session.headers.update({'User-Agent': 'user - agent'})
    try:                                           #
        login = session.get(url, proxies=proxies)  # HERE IS WHERE MEMORY LEAKS
    except:                                        #
        return -1                                  #
    return 0

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.6.3"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.18.4"
  },
  "system_ssl": {
    "version": "100020bf"
  },
  "urllib3": {
    "version": "1.22"
  },
  "using_pyopenssl": false
}

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 9
  • Comments: 25 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Call Session.close() and Response.close() can avoid the memory leak. And ssl will consume more memory so the memory leak will more remarkable when request https urls.

First I make 4 test cases:

  1. requests + ssl (https://)
  2. requests + non-ssl (http://)
  3. aiohttp + ssl (https://)
  4. aiohttp + non-ssl (http://)

Pseudo code:

def run(url):
    session = requests.session()
    response = session.get(url)

while True:
    for url in urls:  # about 5k urls of public websites
        # execute in thread pool, size=10
        thread_pool.submit(run, url)

# in another thread, record memory usage every seconds

Memory usage graph(y-axis: MB, x-axis: time), requests use lots of memory and memory increase very fast, while aiohttp memory usage is stable:

requests-non-ssl requests-ssl aiohttp-non-ssl aiohttp-ssl

Then I add Session.close() and test again:

def run(url):
    session = requests.session()
    response = session.get(url)
    session.close()  # close session !!

Memory usage significant decreased, but memory usage still increase over time:

requests-non-ssl-close-session requests-ssl-close-session

Finally I add Response.close() and test again:

def run(url):
    session = requests.session()
    response = session.get(url)
    session.close()  # close session !!
    response.close()  # close response !!

Memory usage decreased again, and not increase over time:

requests-non-ssl-close-all requests-ssl-close-all

Compare aiohttp and requests shows memory leak is not caused by ssl, it’s caused by connection resources not closed.

Useful scripts:

class MemoryReporter:
    def __init__(self, name):
        self.name = name
        self.file = open(f'memoryleak/memory_{name}.txt', 'w')
        self.thread = None

    def _get_memory(self):
        return psutil.Process().memory_info().rss

    def main(self):
        while True:
            t = time.time()
            v = self._get_memory()
            self.file.write(f'{t},{v}\n')
            self.file.flush()
            time.sleep(1)

    def start(self):
        self.thread = Thread(target=self.main, name=self.name, daemon=True)
        self.thread.start()


def plot_memory(name):
    filepath = 'memoryleak/memory_{}.txt'.format(name)
    df_mem = pd.read_csv(filepath, index_col=0, names=['t', 'v'])
    df_mem.index = pd.to_datetime(df_mem.index, unit='s')
    df_mem.v = df_mem.v / 1024 / 1024
    df_mem.plot(figsize=(16, 8))

System Information:

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.4"
  },
  "platform": {
    "release": "18.0.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.22.0"
  },
  "system_ssl": {
    "version": "1010104f"
  },
  "urllib3": {
    "version": "1.25.6"
  },
  "using_pyopenssl": false
}

Similar issue. Requests eats memory when running in thread. Code to reproduce here:

import gc
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
from memory_profiler import profile

def run_thread_request(sess, run):
    response = sess.get('https://www.google.com')
    return

@profile
def main():
    sess = requests.session()
    with ThreadPoolExecutor(max_workers=1) as executor:
        print('Starting!')
        tasks = {executor.submit(run_thread_request, sess, run):
                    run for run in range(50)}
        for _ in as_completed(tasks):
            pass
    print('Done!')
    return

@profile
def calling():
    main()
    gc.collect()
    return

if __name__ == '__main__':
    calling()

In the code given above I pass a session object around, but if I replace it with just running requests.get nothing changes.

Output is:

➜  thread-test pipenv run python run.py
Starting!
Done!
Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    10     23.2 MiB     23.2 MiB   @profile
    11                             def main():
    12     23.2 MiB      0.0 MiB       sess = requests.session()
    13     23.2 MiB      0.0 MiB       with ThreadPoolExecutor(max_workers=1) as executor:
    14     23.2 MiB      0.0 MiB           print('Starting!')
    15     23.4 MiB      0.0 MiB           tasks = {executor.submit(run_thread_request, sess, run):
    16     23.4 MiB      0.0 MiB                       run for run in range(50)}
    17     25.8 MiB      2.4 MiB           for _ in as_completed(tasks):
    18     25.8 MiB      0.0 MiB               pass
    19     25.8 MiB      0.0 MiB       print('Done!')
    20     25.8 MiB      0.0 MiB       return


Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    22     23.2 MiB     23.2 MiB   @profile
    23                             def calling():
    24     25.8 MiB      2.6 MiB       main()
    25     25.8 MiB      0.0 MiB       gc.collect()
    26     25.8 MiB      0.0 MiB       return

And Pipfile looks like this:

[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true

[requires]
python_version = "3.6"

[packages]
requests = "==2.21.0"
memory-profiler = "==0.55.0"

Same here, any work around?

It seems Requests is still in beta stage having memory leaks like that. Come on, guys, patch this up! πŸ˜‰πŸ‘

Ok, so this is, by the looks of it still an issue. Using requests.Session() to make requests to an https url leads to constantly increasing memory usage and ultimately an OOM condition and a crash. The requests are being made using proxies to some proxy

Here’s a shot of the monotonously increasing mem usage :

image

This is from a production system running (in a docker container):

root@docker-host-01:~/uship-price-optimizer# docker exec -it uship-price-optimizer python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.2.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.5"
  },
  "platform": {
    "release": "5.19.0-46-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.31.0"
  },
  "system_ssl": {
    "version": "30000090"
  },
  "urllib3": {
    "version": "2.0.5"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

But we see the same behavior on Windows:

(venv) PS E:\src\uship-price-optimizer\src> python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.2.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.5"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.31.0"
  },
  "system_ssl": {
    "version": "30000090"
  },
  "urllib3": {
    "version": "1.26.16"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}
(venv) PS E:\src\uship-price-optimizer\src>

AND in WSL2:

(venv_linux) teo@jailbreaker-pc:/mnt/e/src/uship-price-optimizer/src$ python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.2.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.5"
  },
  "platform": {
    "release": "5.15.90.1-microsoft-standard-WSL2",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.31.0"
  },
  "system_ssl": {
    "version": "1010106f"
  },
  "urllib3": {
    "version": "2.0.5"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

I’m observing a memory increase every time a requests.Session() is instantiated, never to be reclaimed. A Session that is actually wrapped in a CloudScraper but there’s nothing special done to how the requests and sessions are handled there. And due to using cloudscraper, I can’t test the code using only requests.get() as opposed to using requests.Session(), btw.

Higher up in this thread there’s this comment by @VeNoMouS , but since the issues tracker of his repo was disabled, I can’t see what he said there, and neither google nor archive.org has a copy of the comment. But whatever it says, I can see the memory leaking every time a new session is created and later discarded.

While investigating what’s going on I stumbled on this original Python issue, migrated to GitHub here, and seems to imply that this is a Windows only problem but that does not seem to be the case. Both are closed as resolved, btw.

Then I went on to memray the thing. Here’s the summary view:

(venv_linux) teo@jailbreaker-pc:/mnt/e/src/uship-price-optimizer/src$ python3.11 -m memray tree  memray-main.py.pydantic_2.bin

Allocation metadata
-------------------
Command line arguments: '/mnt/e/src/uship-price-optimizer/venv_linux/bin/memray run main.py'
Peak memory size: 132.543MB
Number of allocations: 43997698

Biggest 10 allocations:
-----------------------
πŸ“‚ 53.567MB (100.00 %) <ROOT>
β”œβ”€β”€ [[8 frames hidden in 4 file(s)]]
β”‚   └── πŸ“‚ 40.721MB (76.02 %) retry  /mnt/e/src/uship-price-optimizer/venv_linux/lib/python3.11/site-packages/backoff/_sync.py:105
β”‚       β”œβ”€β”€ [[11 frames hidden in 6 file(s)]]
β”‚       β”‚   └── πŸ“„ 32.564MB (60.79 %) ssl_wrap_socket  /mnt/e/src/uship-price-optimizer/venv_linux/lib/python3.11/site-packages/urllib3/util/ssl_.py:444
β”‚       └── [[8 frames hidden in 5 file(s)]]
β”‚           └── πŸ“„ 7.006MB (13.08 %) raw_decode  /usr/lib/python3.11/json/decoder.py:353
└── [[3 frames hidden in 2 file(s)]]
    └── πŸ“‚ 12.846MB (23.98 %) _run_code  <frozen runpy>:88
        β”œβ”€β”€ [[33 frames hidden in 7 file(s)]]
        β”‚   └── πŸ“‚ 5.503MB (10.27 %) _call_with_frames_removed  <frozen importlib._bootstrap>:241
        β”‚       β”œβ”€β”€ [[16 frames hidden in 7 file(s)]]
        β”‚       β”‚   └── πŸ“„ 4.012MB (7.49 %) validate_core_schema  /mnt/e/src/uship-price-optimizer/venv_linux/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:586
        β”‚       └── [[3 frames hidden in 2 file(s)]]
        β”‚           └── πŸ“„ 1.491MB (2.78 %) create_schema_validator  /mnt/e/src/uship-price-optimizer/venv_linux/lib/python3.11/site-packages/pydantic/plugin/_schema_validator.py:34
        β”œβ”€β”€ [[8 frames hidden in 5 file(s)]]
        β”‚   └── πŸ“„ 3.000MB (5.60 %) __next__  /usr/lib/python3.11/csv.py:119
        β”œβ”€β”€ [[12 frames hidden in 4 file(s)]]
        β”‚   └── πŸ“„ 1.846MB (3.45 %) _compile_bytecode  <frozen importlib._bootstrap_external>:729
        └── [[5 frames hidden in 3 file(s)]]
            └── πŸ“‚ 2.496MB (4.66 %) _call_with_frames_removed  <frozen importlib._bootstrap>:241
                β”œβ”€β”€ [[30 frames hidden in 7 file(s)]]
                β”‚   └── πŸ“„ 1.319MB (2.46 %) _compile_bytecode  <frozen importlib._bootstrap_external>:729
                └── [[22 frames hidden in 5 file(s)]]
                    └── πŸ“„ 1.177MB (2.20 %) _compile_bytecode  <frozen importlib._bootstrap_external>:729

and the memory stack with util/ssl_.py

image

Looking around at the solutions to memory leaks in various systems mentioning and linked to this issue, I changed our code to β€œforce close” the sessions using:

    with cloudscraper.create_scraper(...) as session:
        session.proxies = proxyconfig.get_proxy()
        .
        .
        .

cloudscraper.create_scraper instantiates a requests.Session() essentially.

for making the requests I changed it to

        with session.get(url='.......',
                         # allow_redirects=True,
                         params={...<params>...},
                         timeout=5, ) as response:
        .
        .
        .

This seems to have improved the situation a bit, as at least now it’s not monotonously growing but also has some [slight] reductions

image

But this is still not what I think it should be looking like.

Currently, we are keeping it under control by setting a memory limit on the container it’s running on in productions and we are also passing

...--max-requests 750 --max-requests-jitter 50... to gunicorn

So, what would you suggest the next logical step to be?

Thanks!

Same for me

Same for me… leakage while on threadpool execution is on Windows python38 too. requests 2.22.0

Any update on this? Simple POST request with a file upload also creates the similar issue of the memory leak.

Hey @munroc, a couple quick questions about your threading implementation since it’s not included in the pseudo code.

  • Are you creating a new session for every thread and what size is the threadpool you’re using?

  • What tool are you using to determine where the leak is coming from? Would you mind sharing the results?

We’ve had hints of memory leaks around sessions for a while now, but I’m not sure we’ve found a smoking gun or truly confirmed impact.

Please provide us with the output of

python -m requests.help

If that is unavailable on your version of Requests please provide some basic information about your system (Python version, operating system, etc).