scikit-learn: PyPy tests timeouts / memory usage investigation

EDIT: one of the main causes of the problem described below has already been fixed by #27670. However, despite this improvement, there are still important memory problems remaining when running the scikit-learn test suite on PyPy. So similar investigation and fixes are needed to iteratively solve the next worst offenders until the tests can run with an amount of memory comparable to what we observe with CPython (instead of a factor of 10).

Original description:

I had a closer look at the PyPy tests which have been timing out for a while, here is the result of my investigation. This may also help in the future to have a central issue for this rather than the discussion being split in different automatically created issues in this repo and scikit-learn-feedstock.

The PyPy tests locally needs ~11GB on my machine whereas it is 1.2GB with CPython. I ran them without using pytest-xdist to simplify things.

PyPy pypy

CPython cpython

It seems like one of the where the memory usage grows with time is the linear_model tests (needs 3.4GB with PyPy and 200MB with CPython locally).

PyPy linear_model_pypy

CPython linear_model_python

I manage to reproduce the issue (memory growing way more than on CPython) with the following snippet, where one of our Cython loss functions is called many times in a tight loop:

import psutil
import gc
from functools import partial
import platform

import numpy as np

from sklearn._loss.loss import HalfGammaLoss

IS_PYPY = platform.python_implementation() == "PyPy"

def func(data):
    loss = HalfGammaLoss()
    for i in range(10_000):
        loss(data, data)


def main():
    for i in range(101):
        if i % 10 == 0:
            memory_usage = psutil.Process().memory_info().rss / 1e9
            message = f'{i} psutil: {memory_usage:.3f}GB'
            if IS_PYPY:
                pypy_memory_usage = gc.get_stats().memory_allocated_sum
                message += f' , pypy total allocated: {pypy_memory_usage}'
            print(message)

        n_samples, n_features = 4, 12
        rng = np.random.RandomState(0)
        raw_prediction = rng.uniform(low=-3, high=3, size=n_samples)
        func(raw_prediction)

    if IS_PYPY:
        print(gc.get_stats())

if __name__ == '__main__':
    main()

Output shows that the memory usage grows with time and that it is about 10 times the total memory allocated reported by PyPy:

0 psutil: 0.175GB , pypy total allocated: 69.2MB
10 psutil: 0.537GB , pypy total allocated: 113.9MB
20 psutil: 0.888GB , pypy total allocated: 157.4MB
30 psutil: 1.257GB , pypy total allocated: 201.4MB
40 psutil: 1.595GB , pypy total allocated: 247.4MB
50 psutil: 1.931GB , pypy total allocated: 291.9MB
60 psutil: 2.325GB , pypy total allocated: 330.4MB
70 psutil: 2.659GB , pypy total allocated: 374.4MB
80 psutil: 3.003GB , pypy total allocated: 423.9MB
90 psutil: 3.327GB , pypy total allocated: 460.9MB
100 psutil: 3.663GB , pypy total allocated: 508.4MB

Not entirely sure whether this is a red herring, there is an issue with our Cython loss implementation, or this is an issue at the interaction between Cython and PyPy. Maybe @mattip has some insights about this?

On CPython the memory usage is stable around 100MB

0 psutil: 0.097GB
10 psutil: 0.098GB
20 psutil: 0.098GB
30 psutil: 0.098GB
40 psutil: 0.098GB
50 psutil: 0.098GB
60 psutil: 0.098GB
70 psutil: 0.098GB
80 psutil: 0.098GB
90 psutil: 0.098GB
100 psutil: 0.098GB
My environment for good measure

 mamba list               
# packages in environment at /home/lesteve/micromamba/envs/pypy:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
cython                    3.0.3            py39hc10206b_0    conda-forge
exceptiongroup            1.1.3              pyhd8ed1ab_0    conda-forge
execnet                   2.0.2              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
gdbm                      1.18                 h0a1914f_2    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
joblib                    1.3.2              pyhd8ed1ab_0    conda-forge
libblas                   3.9.0           19_linux64_openblas    conda-forge
libcblas                  3.9.0           19_linux64_openblas    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_2    conda-forge
libgfortran5              13.2.0               ha4646dd_2    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
liblapack                 3.9.0           19_linux64_openblas    conda-forge
libopenblas               0.3.24          pthreads_h413a1c8_0    conda-forge
libsqlite                 3.43.2               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_2    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.26.0           py39h6dedee3_0    conda-forge
openssl                   3.1.4                hd590300_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pip                       23.3               pyhd8ed1ab_0    conda-forge
pluggy                    1.3.0              pyhd8ed1ab_0    conda-forge
psutil                    5.9.5            py39hf860d4a_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pypy                      7.3.13                 0_pypy39    conda-forge
pypy3.9                   7.3.13               h9557127_0    conda-forge
pytest                    7.4.2              pyhd8ed1ab_0    conda-forge
pytest-repeat             0.9.2              pyhd8ed1ab_0    conda-forge
pytest-xdist              3.3.1              pyhd8ed1ab_0    conda-forge
python                    3.9.18                0_73_pypy    conda-forge
python_abi                3.9               4_pypy39_pp73    conda-forge
readline                  8.2                  h8228510_1    conda-forge
scipy                     1.11.3           py39h6dedee3_1    conda-forge
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
sqlite                    3.43.2               h2c6b66d_0    conda-forge
threadpoolctl             3.2.0              pyha21a80b_0    conda-forge
tk                        8.6.13               h2797004_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libx11               1.8.7                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 25 (20 by maintainers)

Most upvoted comments

I found my linux machine and could try @lesteve script with the above changes. Interestingly, it removes the memory leak:

0 psutil: 0.199GB , pypy total allocated: 97.0MB
10 psutil: 0.227GB , pypy total allocated: 98.2MB
20 psutil: 0.227GB , pypy total allocated: 98.2MB
30 psutil: 0.227GB , pypy total allocated: 98.2MB
40 psutil: 0.227GB , pypy total allocated: 98.2MB
50 psutil: 0.227GB , pypy total allocated: 98.2MB
60 psutil: 0.227GB , pypy total allocated: 98.2MB
70 psutil: 0.227GB , pypy total allocated: 98.2MB
80 psutil: 0.227GB , pypy total allocated: 98.2MB
90 psutil: 0.227GB , pypy total allocated: 98.2MB
100 psutil: 0.227GB , pypy total allocated: 98.2MB
Total memory consumed:
    GC used:            85.2MB (peak: 97.2MB)
       in arenas:            62.0MB
       rawmalloced:          20.2MB
       nursery:              3.0MB
    raw assembler used: 664.6kB
    -----------------------------
    Total:              85.8MB

    Total memory allocated:
    GC allocated:            97.2MB (peak: 102.5MB)
       in arenas:            68.7MB
       rawmalloced:          25.5MB
       nursery:              3.0MB
    raw assembler allocated: 1.0MB
    -----------------------------
    Total:                   98.2MB

    Total time spent in GC:  11.162

So we should probably try to look at the np.asarray that we are using to convert memoryview back to arrays since that there is something going there.

Thanks for the minimum reproducer. I opened a pypy issue.

Thanks for the input! I said “timed out” which is not very accurate. I am reasonably confident the tests get killed because they use too much memory.

I would keep this one open since I may have another look in the future. Having said that, this is not very high priority, since PyPy tests time out a lot rarely after #27670 was merged.

A quick look at #27750 seems to indicate that PyPy tests timed out 5 times since November 8 (i.e. in 23 days or so) so the situation is a lot better than previously when it was always timing out.

Saving the reproducer as test_pypy.pyx, and running cythonize test_pypy.pyx results in a c-extension module. I can use it to recreate the problem

$ pypy3.9 -m venv /tmp/venv3
$ source /tmp/venv3/bin/activate
$ python -m pip install numpy cython
$ cythonize temp_pypy
$ python
>>>> import test_pypy, numpy as np
>>>> arr_out = np.ones(1_000_000, dtype = np.float64)
>>>> a = test_pypy.cy_func(arr_out)
>>>> type(a)
<class 'numpy.ndarray'>
>>>> type(a.base)
<class 'memoryview'>
>>>> type(a.base.obj)
<class 'test_pypy._memoryviewslice'>
>>>> a.base.obj.base is arr_out
True

So cython has a class _memoryviewslice which has a __dealloc__ method, and hangs on to the original object, which is wrapped in a memoryview, which is wrapped in an ndarray. This is going to create gc cycles between objects in a way that I am not sure even the cpyext-gc-cycle branch will be able to untangle.

So maybe we can wait before attempting to fix the 37 occurrences of this pattern in scikit-learn. Please let us know @mattip if this is too complex to fix on the PyPy/Cython side: we could then change scikit-learn to avoid this pattern in the first place.

Great news. A smaller reproducer would be welcome, although I imagine it hits the C-python mutual reference cycle problem I mentioned above. At least I could add it to the PyPy FAQ of “things to avoid”.