numba: Speed regression in 0.29.0

Referring to the discussion in #2176 it looks like there is a major speed regression from 0.28.1 to 0.29.0. The example function below and similar tight loops, doing merely more than nan checks, additions etc, only run at a fraction of the initial speed. This heavily affects the usability of numba as a replacement for some C implementations and renders 0.29.0 unusable for me. I’d also be happy to sponsor a bug bounty for this issue.

import numpy as np
import numba as nb
print nb.__version__

@nb.njit
def nanmin_numbagg_1dim(a):
    amin = np.infty
    all_missing = 1
    for ai in a.flat:
        if ai <= amin:
            amin = ai
            all_missing = 0
    if all_missing:
        amin = np.nan
    return amin

x = np.random.random(100000)
x[x>0.7] = np.nan
%timeit nanmin_numbagg_1dim(x)
0.34.0
10000 loops, best of 3: 190 µs per loop
0.29.0
10000 loops, best of 3: 192 µs per loop
0.28.1
10000 loops, best of 3: 57.4 µs per loop

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Comments: 29 (25 by maintainers)

Most upvoted comments

Right, so much has changed that the C code no longer represents the Numba code here.

As for the C code, i think the problem comes from LLVM choosing to simd-vectorize that code pattern even though the the scalar path is better. I also think there is a problem of CPU arch and OS behavior. I am guessing @guilhermeleobas is running linux and i’m on OSX. Could it be the alignment of double arr[size] vs the SIMD misalign penalty?

For reference, our previously created LLVM bug can be found here now: https://github.com/llvm/llvm-project/issues/31370

I have further narrowed down the problem to multiple calls to llvm simplifycfg pass. I am asking on llvm dev mailing list for the experts to take a look.