numba: Significant performance regression for datashader in numba 0.49.x

Using latest released numba (0.49.1) I’m seeing a significant performance regression when compared to numba 0.48 in a simple datashader aggregation example. I’m downgrading from 0.49.1 to 0.48 with conda install numba=0.48 --no-deps to ensure no other packages change. The simple test case I’m using is the following:

import timeit

from functools import partial

import datashader as ds
import numba
import numpy as np
import pandas as pd

canvas = ds.Canvas(plot_height=1000, plot_width=1000)

def agg(df):
    canvas.points(df, 'x', 'y', agg=ds.mean('value'))

def test_agg_performance(N, repeats=10):
    df = pd.DataFrame({'x': np.random.randn(N), 'y': np.random.randn(N), 'value': np.random.rand(N)})
    agg(df) # Warm up JIT
    return timeit.timeit(partial(agg, df), number=repeats)/repeats

print(f'Numba version: {numba.__version__}')
[(n, test_agg_performance(int(n))) for n in np.logspace(0, 8, 9)]

Here is a graph of the performance difference by the number of points being aggregated:

bokeh_plot - 2020-05-25T153633 150

And here is the notebook which I used to generate the plot: https://anaconda.org/philippjfr/profiling_numba/notebook

numba: Significant performance regression for datashader in numba 0.49.x

About this issue

Most upvoted comments