sqlalchemy: memory leak with engine.core and numpy
Describe the bug
I have a long running process that periodically requests data from a postgresql database and converts to the results to a numpy array for analysis. I see the process leaking memory on each coercion of the results to a numpy array.
In my real process after a couple of days it can be using 9 GB of RAM and after a week it crashes the server. The previous version (python2.7) uses less than 200MB even after running for weeks but that version used psycopg2 directly. I could go back to doing that but I wanted to use sqlalchemy across the codebase by default and I’d also like to understand why this isn’t working properly.
While creating the test case, I found that the same issue occurs with a sqlite in memory database. I also saw the same issue on my laptop and my server (both using sqlalchemy v.1.4.32 and v1.4.23)
To Reproduce
import sqlalchemy as sql
import numpy as np
import time
dburl = 'sqlite://'
engine = sql.create_engine(dburl, echo=False)
meta = sql.MetaData()
tbl = sql.Table('test_table',meta,
sql.Column('a',sql.Integer,primary_key=True),
sql.Column('b',sql.Integer),
)
meta.create_all(engine)
#fill table
with engine.connect() as conn:
conn.execute(tbl.insert(),
[{'b': i} for i in range(200)]
)
#test
while True:
with engine.connect() as conn:
query = sql.text("SELECT a, b FROM test_table ORDER BY a LIMIT 200")
results = conn.execute(query)
rows = results.fetchall()
data = np.array(rows)
#fixes that didn't work
del rows
results.close()
del data
time.sleep(0.1)
Error
Resident memory usage is up to 358MB in just a few minutes and there’s no sign of stabilisation. after a couple of minutes
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1952762 aaa 20 0 762204 186264 16188 S 5.7 0.9 0:09.77 python3
a bit later
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1952762 aaa 20 0 934200 358092 16188 S 5.6 1.8 0:21.20 python3
Versions
- OS: Linux (CentOS)
- Python: 3.6.8
- SQLAlchemy: 1.4.32, 1.4.23
- Database: Postgresql (v12.9 64bit)
- DBAPI: psycopg2
Additional context
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 26 (22 by maintainers)
It’s not it main (since there was no bug there), so the issue will not be closed automatically
Just to note (coming from NumPy), I am very sure that the issue is leaking the error message string when you do:
(and similar). NumPy will try such attributes and clear the error. The C-code needs to DECREF the string. Although, using
PyErr_Format
is probably more convenient than getting the reference count right manually.EDIT: OK, You are already using
Formt
, so I change my suggestion to using%S
and removing the unnecessary call toPyUnicode_AsASCIIString(name);
. So what is leaked, seems just thatbytes
version of the name, you can also justDECREF
that after formatting the error.