scipy: BUG: test failure in `test_x0_equals_Mb` with `bicgstab`

In conda-forge, we ran into new test failure for scipy 1.8, which appears only with MKL (which is available in cf only for x86), and only if the processor supports AVX512 (which azure CI only has for linux & windows), see https://github.com/conda-forge/scipy-feedstock/pull/199.

To not further delay the release of 1.8, we skipped that test for now, but it should IMO be fixed, especially as such processors are becoming more and more common (in fact, it’s getting harder and harder to purposefully catch a non-AVX512 windows CI agent on azure)

The failure is in test_x0_equals_Mb[bicgstab] and looks as follows:

=================================== FAILURES ===================================
_________________________ test_x0_equals_Mb[bicgstab] __________________________
[...]/lib/python3.8/site-packages/scipy/sparse/linalg/_isolve/tests/test_iterative.py:538: in test_x0_equals_Mb
    assert_equal(info, 0)
E   AssertionError: 
E   Items are not equal:
E    ACTUAL: -11
E    DESIRED: 0
        A          = <10x10 sparse matrix of type '<class 'numpy.complex64'>'
	with 19 stored elements in Compressed Sparse Row format>
        b          = array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
        case       = <nonsymposdef>
        info       = -11
        solver     = <function bicgstab at 0x7fc6545a4ee0>
        sup        = <numpy.testing._private.utils.suppress_warnings object at 0x7fc62555d040>
        tol        = 1e-08
        x          = array([0.        +0.j, 0.50000358+0.j, 1.25016941+0.j, 2.12653797+0.j,
       3.0674186 +0.j, 4.03716976+0.j, 5.0158146 +0.j, 6.00288608+0.j,
       7.        +0.j, 8.        +0.j])
        x0         = 'Mb'

Also, at the end of the test suite, some (seemingly delayed) log output appears that is perhaps relevant:


   Normal return from subroutine COBYLA

   NFVALS =   50   F = 2.485185E+01    MAXCV = 1.999965E-10
   X = 4.955358E+00   6.666553E-01

 NNLS quitting on iteration count.

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 1
Comments: 24 (24 by maintainers)

Most upvoted comments

Even if it does not go away, because it’s now pure python, a couple of print statements will show the issue and the smoking gun. These minute things make all this Fortran conversion pain worthwhile personally.

ilayn on Jan 10, 2024

I tested this some more in https://github.com/conda-forge/scipy-feedstock/pull/242, and what’s very strange is that now (still for SciPy 1.11.4, thus unrelated to the changes in 1.12), I get:

	before	after	comment
linux + x64 + blis + avx2	✔️	✔️
linux + x64 + blis + avx512	✔️	✔️
linux + x64 + mkl + avx2	✔️	✔️
linux + x64 + mkl + avx512	❌	❌	unchanged
linux + x64 + openblas + avx2	✔️	✔️
linux + x64 + openblas + avx512	✔️	❌	started failing
win + x64 + blis + avx2	✔️	❌	started failing
win + x64 + blis + avx512	✔️	❌	started failing
win + x64 + mkl + avx2	✔️	✔️
win + x64 + mkl + avx512	❌	✔️	stopped failing
win + x64 + openblas + avx2	✔️	✔️
win + x64 + openblas + avx512	✔️	✔️

The versions of relevant libraries before & after:

lib	for 1.11.0	now	updated version
`scipy`	`1.11.0`	`1.11.4`	X
`numpy`	`1.25.0`	`1.26.3`	X
`blis`	`0.9.0-1`	`0.9.0-1`
`openblas`	`0.3.23-pthreads-0`	`0.3.25-pthreads-0`	X
`mkl`	`2022.2.1-16997` (linux) `2022.2.1-16952` (osx) `2022.1.0-874` (win)	`2023.2.0-50496` (linux) `2023.2.0-50500` (osx) `2023.2.0-50497` (win)	X
`pythran`	`0.13.1-0`	`0.15.0-0`	X
`qemu-user-static`	`7.2.0-1`	`8.1.3-1`	X

The fact that blis could start failing (while being unchanged in terms of build) must almost certainly be related to LAPACK (since blis only provides BLAS, and we add netlib’s LAPACK to that), as we also went from 3.9.0-17 to 3.9.0-20 for our blas metapackage.

h-vetinari on Jan 10, 2024

Hmm let me have a look at this one. Strange that it causes problems.

ilayn on Jan 2, 2024

About the error, this is a breakdown case where the algorithm hits a point that can’t proceed further due to numerical problems. But because the tests are/were too tight it was sporadic. Now all tests pass.

ilayn on May 29, 2023

This is going to be fixed together with #18488 so 1.12 is fine. I’ll make a round the issues that PR closes later.

ilayn on May 29, 2023