numba: Typed list of arrays much slower than a reflected list

I am using Python 3.6.8 with Numba 0.45.1. The following (very contrived) example demonstrates the problem:

import numpy as np
from numba import njit
from numba.typed import List
import time

def make_array_lists(shapes):

    typed_list = List()
    reflected_list = []

    for shape in shapes:
        typed_list.append(np.zeros(shape, dtype=np.float64))
        reflected_list.append(np.zeros(shape, dtype=np.float64))

    return typed_list, reflected_list

@njit(cache=True)
def mutate_array_list(array_list):

    for j in range(len(array_list)):
        dim0, dim1 = array_list[j].shape
        for row in range(dim0):
            for col in range(dim1):
                array_list[j][row, col] += 1


if __name__ == "__main__":

    array_shapes = [(200,200), (300,300), (400,400)]
    nrun = 1000

    typed_list, reflected_list = make_array_lists(array_shapes)

    t0 = time.time()
    [mutate_array_list(typed_list) for i in range(nrun)]
    average_time = (time.time() - t0)/nrun
    print("Typed list took {:.6f} seconds per run.".format(average_time))

    t0 = time.time()
    [mutate_array_list(reflected_list) for i in range(nrun)]
    average_time = (time.time() - t0)/nrun
    print("Reflected list took {:.6f} seconds per run.".format(average_time))

The output of the above is:

Typed list took 0.014759 seconds per run.
Reflected list took 0.000316 seconds per run.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 3
  • Comments: 38 (25 by maintainers)

Most upvoted comments

I have revisited this issue using the new getitem_unchecked introduced in https://github.com/numba/numba/pull/6278 — this will do a getitem, but it will not check that the index i is within the bounds, so the user will be responsible for ensuring not out of bounds look-ups will be made (else this may well result in a segfault).

import numpy as np
from numba import njit
from numba.typed import List
import time


def make_array_lists(shapes):

    typed_list = List()
    reflected_list = []

    for shape in shapes:
        typed_list.append(np.zeros(shape, dtype=np.float64))
        reflected_list.append(np.zeros(shape, dtype=np.float64))

    return typed_list, reflected_list


@njit
def mutate_array_list_typed(array_list):
    for j in range(len(array_list)):
        dim0, dim1 = array_list.getitem_unchecked(j).shape
        for row in range(dim0):
            for col in range(dim1):
                array_list.getitem_unchecked(j)[row, col] += 1

@njit
def mutate_array_list_reflected(array_list):
    for j in range(len(array_list)):
        dim0, dim1 = array_list[j].shape
        for row in range(dim0):
            for col in range(dim1):
                array_list[j][row, col] += 1


if __name__ == "__main__":

    array_shapes = [(200,200), (300,300), (400,400)]
    nrun = 1000

    typed_list, reflected_list = make_array_lists(array_shapes)

    # executed to compile, so that compilation time does not skew the benchmark
    mutate_array_list_typed(typed_list)
    mutate_array_list_reflected(reflected_list)

    t0 = time.time()
    [mutate_array_list_typed(typed_list) for i in range(nrun)]
    average_time = (time.time() - t0) / nrun
    print("Typed list took {:.6f} seconds per run.".format(average_time))

    t0 = time.time()
    [mutate_array_list_reflected(reflected_list) for i in range(nrun)]
    average_time = (time.time() - t0) / nrun
    print("Reflected list took {:.6f} seconds per run.".format(average_time))

The back-of-bank-statement performance characteristics are:

Typed list took 0.000429 seconds per run.
Reflected list took 0.000398 seconds per run.

In order to circumvent my current issues with the typed list, I resorted to using tuples. However, I then slam headfirst into #4277, #2651 and #2625 when using prange. This unfortunately puts me in a position where I have no option but to resort to reflected lists as I need all of the following:

  • capacity to group an arbitrary number of arrays with equal ndim but differing shapes together.
  • the option to use prange.
  • performance when accessing list elements.