numba: Function accepting njitted functions as arguments is slow
- I am using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
- I have included below a minimal working reproducer (if you are unsure how to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).
I was trying numba 0.38 and the new support for jitted functions as arguments with this code snippet:
# coding: utf-8
from scipy.optimize import newton
from numba import njit
@njit
def func(x):
return x**3 - 1
@njit
def fprime(x):
return 3 * x**2
@njit
def njit_newton(func, x0, fprime):
for _ in range(50):
fder = fprime(x0)
fval = func(x0)
newton_step = fval / fder
x = x0 - newton_step
if abs(x - x0) < 1.48e-8:
return x
x0 = x
get_ipython().run_line_magic('timeit', 'newton(func.py_func, 1.5, fprime=fprime.py_func)')
get_ipython().run_line_magic('timeit', 'newton(func, 1.5, fprime=fprime)')
get_ipython().run_line_magic('timeit', 'njit_newton.py_func(func, 1.5, fprime=fprime)')
get_ipython().run_line_magic('timeit', 'njit_newton(func, 1.5, fprime=fprime)')
And I found surprising that njit_newton is the slowest of all, while njit_newton.py_func is the fastest:
$ ipython test_perf.py
4.76 µs ± 8.52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.14 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.58 µs ± 26 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
20 µs ± 85.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
(Inspiration: https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a3ccd5b/scipy/optimize/zeros.py#L164-L182)
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 1
- Comments: 21 (15 by maintainers)
I can confirm that this issue exists. However, as mentioned above, the issue does in fact seem to be caused by a cost when calling Numba jitted code from Python.
The difference in performance when comparing the
foofunctions is great, however, since timeit is called from the Python context these timings are largely affected by Numba invokation costs.The difference in performance when comparing the
barfunctions is minimal, because now most of the time is actually spent in the function and not in interfacing between Numba and Python.For reference, if the functions do any real work, the differences disappear (and strangely reverse, which I cannot explain)