sunpy: Intermittent image-rotation test failures on OS X when using conda
We have intermittent failures of our image rotation tests and these failures appear to be isolated to OS X when using conda(-forge). #4235 added raw output for these failures occurred, and some of the output is truly bizarre. I will add investigative stuff in separate posts. My current conjecture is there is nothing wrong with SunPy code, but rather a C extension in the numpy/scipy/scikit-image ecosystem is not being compiled for conda(-forge) with the correct compile options for OS X such that there’s the intermittent potential for bad memory access of arrays.
Edit: go down to https://github.com/sunpy/sunpy/issues/4290#issuecomment-676573472 for a summary of the current understanding
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (14 by maintainers)
Latest facts
pytest-xdist), but they can still occur with only a single worker (specifying-n 1or-n 0).scikit-imagefunction, the actual line that can produce incorrect results is a NumPy matrix multiplication of a large matrix, which then calls OpenBLAS.OMP_NUM_THREADS=1to configure OpenBLAS to use only a single thread. (Incidentally, the environment variableOPENBLAS_NUM_THREADS=1would also work if OpenBLAS is compiled to use pthreads threads, but conda-forge compiles OpenBLAS to use OpenMP threads. For whatever reason,OMP_NUM_THREADS=1works regardless of which type of threads OpenBLAS has been compiled to use.) This can have negative impacts on the performance, and as an environment variable, can also affect other libraries.Conjecture
For tests
There are two ways to get our OS X conda tests to pass reliably:
OMP_NUM_THREADS=1For users
I still don’t know if a typical user can ever trigger these errors outside of running tests. But, if it is possible, corrupted matrix multiplications are indisputably bad. I’m inclined to force the MKL libraries as an explicit dependency, because disabling multithreading seems like far too sweeping of a fix, with the potential of doing far more harm than good.
Arrays should be defaulting to row-major and C contiguous, yet these problem segments are both columns. I don’t know what to make of that.