scipy: BUG: test failure in `test_x0_equals_Mb` with `bicgstab`
In conda-forge, we ran into new test failure for scipy 1.8, which appears only with MKL (which is available in cf only for x86), and only if the processor supports AVX512 (which azure CI only has for linux & windows), see https://github.com/conda-forge/scipy-feedstock/pull/199.
To not further delay the release of 1.8, we skipped that test for now, but it should IMO be fixed, especially as such processors are becoming more and more common (in fact, it’s getting harder and harder to purposefully catch a non-AVX512 windows CI agent on azure)
The failure is in test_x0_equals_Mb[bicgstab]
and looks as follows:
=================================== FAILURES ===================================
_________________________ test_x0_equals_Mb[bicgstab] __________________________
[...]/lib/python3.8/site-packages/scipy/sparse/linalg/_isolve/tests/test_iterative.py:538: in test_x0_equals_Mb
assert_equal(info, 0)
E AssertionError:
E Items are not equal:
E ACTUAL: -11
E DESIRED: 0
A = <10x10 sparse matrix of type '<class 'numpy.complex64'>'
with 19 stored elements in Compressed Sparse Row format>
b = array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
case = <nonsymposdef>
info = -11
solver = <function bicgstab at 0x7fc6545a4ee0>
sup = <numpy.testing._private.utils.suppress_warnings object at 0x7fc62555d040>
tol = 1e-08
x = array([0. +0.j, 0.50000358+0.j, 1.25016941+0.j, 2.12653797+0.j,
3.0674186 +0.j, 4.03716976+0.j, 5.0158146 +0.j, 6.00288608+0.j,
7. +0.j, 8. +0.j])
x0 = 'Mb'
Also, at the end of the test suite, some (seemingly delayed) log output appears that is perhaps relevant:
Normal return from subroutine COBYLA
NFVALS = 50 F = 2.485185E+01 MAXCV = 1.999965E-10
X = 4.955358E+00 6.666553E-01
NNLS quitting on iteration count.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 24 (24 by maintainers)
Even if it does not go away, because it’s now pure python, a couple of print statements will show the issue and the smoking gun. These minute things make all this Fortran conversion pain worthwhile personally.
I tested this some more in https://github.com/conda-forge/scipy-feedstock/pull/242, and what’s very strange is that now (still for SciPy 1.11.4, thus unrelated to the changes in 1.12), I get:
The versions of relevant libraries before & after:
scipy
1.11.0
1.11.4
numpy
1.25.0
1.26.3
blis
0.9.0-1
0.9.0-1
openblas
0.3.23-pthreads-0
0.3.25-pthreads-0
mkl
2022.2.1-16997
(linux)2022.2.1-16952
(osx)2022.1.0-874
(win)2023.2.0-50496
(linux)2023.2.0-50500
(osx)2023.2.0-50497
(win)pythran
0.13.1-0
0.15.0-0
qemu-user-static
7.2.0-1
8.1.3-1
The fact that
blis
could start failing (while being unchanged in terms of build) must almost certainly be related to LAPACK (since blis only provides BLAS, and we add netlib’s LAPACK to that), as we also went from3.9.0-17
to3.9.0-20
for our blas metapackage.Hmm let me have a look at this one. Strange that it causes problems.
About the error, this is a breakdown case where the algorithm hits a point that can’t proceed further due to numerical problems. But because the tests are/were too tight it was sporadic. Now all tests pass.
This is going to be fixed together with #18488 so 1.12 is fine. I’ll make a round the issues that PR closes later.