pandas: Memory leak in `df.to_json`
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
while True:
body = df.T.to_json()
print("HI")
Problem description
If we repeatedly call to_json()
on a dataframe, memory usage grows continuously:
Expected Output
I would expect memory usage to stay constant
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
/usr/local/lib/python3.6/dist-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use “pip install psycopg2-binary” instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
“”")
INSTALLED VERSIONS
commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-43-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.4 pytest: None pip: 18.1 setuptools: 40.6.3 Cython: 0.25.2 numpy: 1.15.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 7.1.1 sphinx: None patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.0.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: None pymysql: None psycopg2: 2.7.5 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 16 (5 by maintainers)
hi, i am facing the same issue about memory leak in df.to_json().
Here I am using df.to_dict() and pass it to python’s json.dump, the memory use is stable
But when I use the df.to_json()
Code Sample
INSTALLED VERSIONS
commit: None python: 3.6.10.final.0 python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_GB.UTF-8
pandas: 0.24.2 pytest: 5.4.3 pip: 19.3.1 setuptools: 44.0.0.post20200106 Cython: None numpy: 1.19.1 scipy: 1.5.2 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.1 pytz: 2019.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.3.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 4.5.0 bs4: 4.8.2 html5lib: None sqlalchemy: 1.3.18 pymysql: 0.9.3 psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: 2.11.2 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
It’s also worth isolating the to_json part from the
df.T
part.On Thu, Jan 24, 2019 at 4:03 PM chris-b1 notifications@github.com wrote: