scipy: eigh() tests fail to pass, crash Python with seemingly ramdom pattern
This problem is related to #11601, which has been closed by #11702 ( @ilayn ). However, the crash has not been fixed by the latter PR.
The symptoms remained almost identical to the one described in my comment in https://github.com/scipy/scipy/issues/11601#issuecomment-600153321
In summary, when running the test for eigh(), Python tends to crash with SIGSEGV or SIGABRT. Sometimes this happens during the test_eigh() function, sometimes after it passed with “100%” but before pytest returns.
The test that triggers the crash is the following test function:
Some patterns from the histories of crashes
I run the test script with runtests.py 100 times and saved the output as text files.
By grepping the output files ./runtests.py, I notice that the last-known position in Python before it crashes could be three lines, namely 873, 876, and 877. L 873 is the actual call to eigh(), while the crash can happen as late as 876 or 877, where the arrays returned from eigh() are accessed.
Only 6 out of 100 runs passed without any problems.
In some cases (35 out of the 100), Python segfaults after nominally completing all the tests in TestEigh::test_eigh.
In the cases where Python was killed with SIGABRT, 36 were at L 873 (call to eigh()), while 9 were at L 876 where output z was used. In many other runs, the test script was not featured in the Python backtrace if any.
The parametrized inputs that triggered the crash were of the form test_eigh[6-D-XXX-YYY-ZZZ-eigvals1]. That is, the crashes happened for dimension 6, dtype double complex, with eigvals= keyword parameter set to the tuple (2, 4). The XXX–ZZZ parameters are boolean flags for keywords turbo, lower, and overwrite respectively.
An incomplete tally of the parameters (turbo, lower, and overwrite), where Python crashed before finishing all the tests, is as follows:
5 False-False-False
11 False-False-True
13 False-True-False
6 False-True-True
7 True-False-False
4 True-False-True
15 True-True-True
The combination (turbo=True, lower=True, overwrite=False) is the one missing from the 2^3 = 8 cases yet.
Reproducing code example:
./runtests.py -vt scipy/linalg/tests/test_decomp.py::TestEigh::test_eigh
Scipy/Numpy/Python version information:
Scipy master branch as of ae34ce48, Numpy 1.18.1, Python 3.7.6, conda macos with MKL 2019.4.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 52 (52 by maintainers)
Intel team confirmed the bug and included the fix for the upcoming MKL 2020 update 2.
@ilayn Done.
This may well be more noise, but just in case it may help, I created a stripped down version in pure C that mostly performs the LAPACK operations done by the Python test snippet, in the hope that one may verify whether this is an underlying C-level MKL issue. The program appears to run fine without any problem. I ran the program in loops that repeat 5000 times per loop, and I have yet to run into a crash.
The C program is as follows:
Running the compiled program linked against MKL 2019.4 gives the following output
Not the failures but actually the ones that pass makes me worried more. This signals that I have to dig in to the LAPACK wrappers instead. Because this really doesn’t make any sense.
The returned array sizes are almost random. I don’t see any underflow pattern hence probably they are random memory values which implies f2py is not returning a proper object which in turn our wrappers are not correct.
These tests were converted to pytest decorators very recently and maybe we have discovered a bug that were not tested properly. I’ll check this properly on a Linux box (I’m on Win10) when I can.
Note sure if this is helpful but posting: