numba: Numba appears ~10X slower running in docker versus native on OSX

TLDR: Running the same numba code in docker versus on OSX appears almost ~10X slower using timeit benchmarks.

Numba on OSX: 0.678s runtime
Numba in docker: 4.900s runtime

(N=10000)

I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.

Hi! I really enjoy numba, and appreciate all the work that goes into it. I wanted to report a strange phenomenon that I thought might be noteworthy. I’m doing some numerical simulation using multinomial distributions. I’m doing millions of samplings and so calculation of a logpmf is becoming a bottleneck. I wanted to give numba a try to see if there was any speed up possible. I’ve attempted to rewrite the hot code paths in numba to see if there were any potential gains there. I however notice the same code running on OSX runs much slower when running in a docker image. The code needs to run in a docker container eventually because it will be deployed in a k8s cluster. Perhaps I’m not writing the code in the most optimal way, but it does seem strange that running in docker could cause such a dramatic change in performance. I might expect a few percentage points because of virtualisation, but not this much. I’ve attached a reproducible example below.

Potential docker considerations:

Using an intel mac, not arm.
Docker VM has 10CPUs available vs 12 for the OSX.
Docker VM has 1GB of swap available.

import ctypes
import timeit

import numba
import numpy
from numba import extending
from numpy.typing import NDArray

_PTR = ctypes.POINTER
_dble = ctypes.c_double
_ptr_dble = _PTR(_dble)

gammaln_functype = ctypes.CFUNCTYPE(_dble, _dble)
cython_gammaln = gammaln_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "gammaln")
)

xlogy_functype = ctypes.CFUNCTYPE(_dble, _dble, _dble)
cython_xlogy = xlogy_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "__pyx_fuse_1xlogy")
)


@numba.vectorize([numba.float64(numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_gammaln(x):
    return cython_gammaln(x)


@numba.vectorize([numba.float64(numba.float64, numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_xlogy(x, y):
    return cython_xlogy(x, y)


@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_logpmf(x: NDArray[numpy.int_], n: int, p: NDArray[numpy.float32]) -> float:
    """Calculate the log probability mass function using vectorised scipy special functions."""
    difference = numpy.sum(numba_xlogy(x, p) - numba_gammaln(x + 1))
    return cython_gammaln(n + 1) + difference


#
# Benchmark
#

N_TESTS = 10000

n = 10
p = numpy.array([0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.83333333])
x = numpy.array([1, 1, 1, 1, 6])

# Trigger JIT if required
numba_logpmf(x, n, p)

print(timeit.timeit(lambda: numba_logpmf(x, n, p), number=N_TESTS))

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 16 (5 by maintainers)

Most upvoted comments

Ok, I found the issue, I switched the base image from python:3.10-slim to python:3.10, the performance was 0.069724. I assume there might be an apt package that’s not present in python:3.10-slim that could be required? Could you guess at what that might be?

I am glad you managed to resolve this issue. I have no idea what package could be missing that would incur such a performance penalty.

esc on Jul 28, 2022