scipy: TST: Test failures with openblas >=0.3.24 in conda-forge on aarch

We recently switched to OpenBLAS 0.3.24 as the default in conda-forge, and now that we have a new PR for the scipy recipe and a corresponding CI run, and aarch is blowing up pretty badly.

The failure mode is very reminiscent of https://github.com/numpy/numpy/issues/24660 actually, further cementing the impression that something’s wrong with OpenBLAS 0.3.24 on aarch. CC @martin-frbg

=========================== short test summary info ============================
FAILED linalg/tests/test_basic.py::TestLstsq::test_random_overdet - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_tall_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_tall_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_delete_last_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_non_unit_strides_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_non_unit_strides_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_neg_strides_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_neg_strides_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_non_itemize_strides_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_non_itemize_strides_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_non_native_byte_order_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRdelete_f::test_non_native_byte_order_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_sqr_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_sqr_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_tall_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_tall_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_tall_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_tall_p_col_tall - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_tall_p_col_sqr - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_tall_p_col_fat - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_fat_p_row_fat - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_fat_p_row_sqr - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_fat_p_row_tall - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_economic_p_col_eco - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_economic_p_col_sqr - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_economic_p_col_fat - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_Mx1_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_Mx1_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_unit_strides_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_unit_strides_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_unit_strides_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_unit_strides_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_neg_strides_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_neg_strides_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_neg_strides_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_neg_strides_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_itemsize_strides_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_itemsize_strides_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_itemsize_strides_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_itemsize_strides_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_native_byte_order_1_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_native_byte_order_p_row - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_native_byte_order_1_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRinsert_f::test_non_native_byte_order_p_col - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_tall_rank_1 - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_tall_rank_p - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_non_unit_strides_rank_1 - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_non_unit_strides_rank_p - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_neg_strides_rank_1 - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_neg_strides_rank_p - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_non_itemsize_strides_rank_1 - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_non_itemsize_strides_rank_p - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_non_native_byte_order_rank_1 - AssertionError:
FAILED linalg/tests/test_decomp_update.py::TestQRupdate_f::test_non_native_byte_order_rank_p - AssertionError:
FAILED linalg/tests/test_lapack.py::test_pftri - AssertionError:
FAILED linalg/tests/test_lapack.py::test_sfrk_hfrk - AssertionError:
FAILED linalg/tests/test_lapack.py::TestBlockedQR::test_geqrt_gemqrt - AssertionError:
FAILED linalg/tests/test_lapack.py::TestBlockedQR::test_tpqrt_tpmqrt - AssertionError:
FAILED linalg/tests/test_lapack.py::test_pstrf - AssertionError:
FAILED linalg/tests/test_lapack.py::test_pstf2 - AssertionError:
= 60 failed, 41935 passed, 2805 skipped, 134 xfailed, 12 xpassed in 5302.31s (1:28:22) =

The failures are not at all inconsequential, with many of them looking something like:

Mismatched elements: 144 / 144 (100%)
Max absolute difference: 1.56318134
Max relative difference: 1.56318134
 x: array([[-0.054046,  0.205639, -0.123517,  0.260355,  0.474265, -0.249005,
         0.177618, -0.301207,  0.618385, -0.037394,  0.023149,  0.252298],
       [ 0.205639,  0.756895, -0.286572,  0.154587,  0.321836,  0.283857,...
 y: array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],...

i.e. complete garbage.

As an almost irrelevant aside, there is one extra failure on osx, but that’s “just” a deterioration of the tolerance:

=========================== short test summary info ============================
FAILED odr/tests/test_odr.py::TestODR::test_implicit - AssertionError: 
Arrays are not almost equal to 6 decimals

Mismatched elements: 1 / 25 (4%)
Max absolute difference: 1.52291797e-06
Max relative difference: 8.81522156e-07
 x: array([[ 2.108927e+00, -1.943767e+00,  7.026353e-02, -4.717525e-02,
         5.251554e-02],
       [-1.943767e+00,  2.048149e+00, -6.160049e-02,  4.626880e-02,...
 y: array([[ 2.108927e+00, -1.943769e+00,  7.026355e-02, -4.717527e-02,
         5.251558e-02],
       [-1.943769e+00,  2.048151e+00, -6.160052e-02,  4.626883e-02,...
= 1 failed, 54544 passed, 2850 skipped, 244 xfailed, 11 xpassed in 1036.02s (0:17:16) =

About this issue

Original URL
State: open
Created 10 months ago
Reactions: 1
Comments: 18 (15 by maintainers)

Most upvoted comments

@steppi volunteered for debugging this one.

rgommers on Sep 8, 2023

Hmm, I think we should indeed expand the usage of OpenBLAS weekly builds. We chose the strategy that we choose for all testing of pre-releases of other dependencies: do it in one or two jobs that are labeled as “pre-release tests”. The rationale for that is that nightlies are typically much more unstable than released versions, and we want the signal of the failure but not all of our CI jobs turning red. Instead, we tend to go report or fix the problem upstream and then wait till the next nightly with the fix is out, and in the meantime just ignore the failure on the pre-release job.

However, OpenBLAS is quite different from other dependencies - failures are much more likely to be architecture-specific. This bug report highlights the need to expand our usage of the weekly builds. It seems like that would have failed on aarch64 (let’s confirm that first by upgrading the relevant CI job) but not on other platforms.

I’ll have a closer look later this morning.

Any change in compilers etc. on your side ?

This could also be a root cause of course. @h-vetinari is testing in conda-forge, where the build time dependencies for the openblas package are likely to have changed between 0.3.23 and 0.3.24. While on the weekly builds they didn’t - so if the weekly builds pass, it’s due to a conda-forge compiler/binutils/etc. change.

As an almost irrelevant aside, there is one extra failure on osx, but that’s “just” a deterioration of the tolerance

That is fixed now, see gh-19200.

rgommers on Sep 8, 2023