numba: Speed regression in 0.29.0
Referring to the discussion in #2176 it looks like there is a major speed regression from 0.28.1 to 0.29.0. The example function below and similar tight loops, doing merely more than nan checks, additions etc, only run at a fraction of the initial speed. This heavily affects the usability of numba as a replacement for some C implementations and renders 0.29.0 unusable for me. I’d also be happy to sponsor a bug bounty for this issue.
import numpy as np
import numba as nb
print nb.__version__
@nb.njit
def nanmin_numbagg_1dim(a):
amin = np.infty
all_missing = 1
for ai in a.flat:
if ai <= amin:
amin = ai
all_missing = 0
if all_missing:
amin = np.nan
return amin
x = np.random.random(100000)
x[x>0.7] = np.nan
%timeit nanmin_numbagg_1dim(x)
0.34.0
10000 loops, best of 3: 190 µs per loop
0.29.0
10000 loops, best of 3: 192 µs per loop
0.28.1
10000 loops, best of 3: 57.4 µs per loop
About this issue
- Original URL
- State: open
- Created 8 years ago
- Comments: 29 (25 by maintainers)
I’ve filed a LLVM bug: https://bugs.llvm.org/show_bug.cgi?id=32022
Right, so much has changed that the C code no longer represents the Numba code here.
As for the C code, i think the problem comes from LLVM choosing to simd-vectorize that code pattern even though the the scalar path is better. I also think there is a problem of CPU arch and OS behavior. I am guessing @guilhermeleobas is running linux and i’m on OSX. Could it be the alignment of
double arr[size]vs the SIMD misalign penalty?For reference, our previously created LLVM bug can be found here now: https://github.com/llvm/llvm-project/issues/31370
I have further narrowed down the problem to multiple calls to llvm simplifycfg pass. I am asking on llvm dev mailing list for the experts to take a look.