scipy: eigh() tests fail to pass, crash Python with seemingly ramdom pattern
This problem is related to #11601, which has been closed by #11702 ( @ilayn ). However, the crash has not been fixed by the latter PR.
The symptoms remained almost identical to the one described in my comment in https://github.com/scipy/scipy/issues/11601#issuecomment-600153321
In summary, when running the test for eigh()
, Python tends to crash with SIGSEGV or SIGABRT. Sometimes this happens during the test_eigh()
function, sometimes after it passed with “100%” but before pytest
returns.
The test that triggers the crash is the following test function:
Some patterns from the histories of crashes
I run the test script with runtests.py
100 times and saved the output as text files.
By grepping the output files ./runtests.py
, I notice that the last-known position in Python before it crashes could be three lines, namely 873, 876, and 877. L 873 is the actual call to eigh()
, while the crash can happen as late as 876 or 877, where the arrays returned from eigh()
are accessed.
Only 6 out of 100 runs passed without any problems.
In some cases (35 out of the 100), Python segfaults after nominally completing all the tests in TestEigh::test_eigh
.
In the cases where Python was killed with SIGABRT, 36 were at L 873 (call to eigh()
), while 9 were at L 876 where output z
was used. In many other runs, the test script was not featured in the Python backtrace if any.
The parametrized inputs that triggered the crash were of the form test_eigh[6-D-XXX-YYY-ZZZ-eigvals1]
. That is, the crashes happened for dimension 6, dtype double complex, with eigvals=
keyword parameter set to the tuple (2, 4)
. The XXX
–ZZZ
parameters are boolean flags for keywords turbo
, lower
, and overwrite
respectively.
An incomplete tally of the parameters (turbo
, lower
, and overwrite
), where Python crashed before finishing all the tests, is as follows:
5 False-False-False
11 False-False-True
13 False-True-False
6 False-True-True
7 True-False-False
4 True-False-True
15 True-True-True
The combination (turbo=True, lower=True, overwrite=False)
is the one missing from the 2^3 = 8 cases yet.
Reproducing code example:
./runtests.py -vt scipy/linalg/tests/test_decomp.py::TestEigh::test_eigh
Scipy/Numpy/Python version information:
Scipy master branch as of ae34ce48, Numpy 1.18.1, Python 3.7.6, conda macos with MKL 2019.4.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 52 (52 by maintainers)
Intel team confirmed the bug and included the fix for the upcoming MKL 2020 update 2.
@ilayn Done.
This may well be more noise, but just in case it may help, I created a stripped down version in pure C that mostly performs the LAPACK operations done by the Python test snippet, in the hope that one may verify whether this is an underlying C-level MKL issue. The program appears to run fine without any problem. I ran the program in loops that repeat 5000 times per loop, and I have yet to run into a crash.
The C program is as follows:
Running the compiled program linked against MKL 2019.4 gives the following output
Not the failures but actually the ones that pass makes me worried more. This signals that I have to dig in to the LAPACK wrappers instead. Because this really doesn’t make any sense.
The returned array sizes are almost random. I don’t see any underflow pattern hence probably they are random memory values which implies f2py is not returning a proper object which in turn our wrappers are not correct.
These tests were converted to pytest decorators very recently and maybe we have discovered a bug that were not tested properly. I’ll check this properly on a Linux box (I’m on Win10) when I can.
Note sure if this is helpful but posting: