numba: Numba appears ~10X slower running in docker versus native on OSX

TLDR: Running the same numba code in docker versus on OSX appears almost ~10X slower using timeit benchmarks.

  • Numba on OSX: 0.678s runtime
  • Numba in docker: 4.900s runtime

(N=10000)

  • I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
  • I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.

Hi! I really enjoy numba, and appreciate all the work that goes into it. I wanted to report a strange phenomenon that I thought might be noteworthy. I’m doing some numerical simulation using multinomial distributions. I’m doing millions of samplings and so calculation of a logpmf is becoming a bottleneck. I wanted to give numba a try to see if there was any speed up possible. I’ve attempted to rewrite the hot code paths in numba to see if there were any potential gains there. I however notice the same code running on OSX runs much slower when running in a docker image. The code needs to run in a docker container eventually because it will be deployed in a k8s cluster. Perhaps I’m not writing the code in the most optimal way, but it does seem strange that running in docker could cause such a dramatic change in performance. I might expect a few percentage points because of virtualisation, but not this much. I’ve attached a reproducible example below.

Potential docker considerations:

  • Using an intel mac, not arm.
  • Docker VM has 10CPUs available vs 12 for the OSX.
  • Docker VM has 1GB of swap available.
import ctypes
import timeit

import numba
import numpy
from numba import extending
from numpy.typing import NDArray

_PTR = ctypes.POINTER
_dble = ctypes.c_double
_ptr_dble = _PTR(_dble)

gammaln_functype = ctypes.CFUNCTYPE(_dble, _dble)
cython_gammaln = gammaln_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "gammaln")
)

xlogy_functype = ctypes.CFUNCTYPE(_dble, _dble, _dble)
cython_xlogy = xlogy_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "__pyx_fuse_1xlogy")
)


@numba.vectorize([numba.float64(numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_gammaln(x):
    return cython_gammaln(x)


@numba.vectorize([numba.float64(numba.float64, numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_xlogy(x, y):
    return cython_xlogy(x, y)


@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_logpmf(x: NDArray[numpy.int_], n: int, p: NDArray[numpy.float32]) -> float:
    """Calculate the log probability mass function using vectorised scipy special functions."""
    difference = numpy.sum(numba_xlogy(x, p) - numba_gammaln(x + 1))
    return cython_gammaln(n + 1) + difference


#
# Benchmark
#

N_TESTS = 10000

n = 10
p = numpy.array([0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.83333333])
x = numpy.array([1, 1, 1, 1, 6])

# Trigger JIT if required
numba_logpmf(x, n, p)

print(timeit.timeit(lambda: numba_logpmf(x, n, p), number=N_TESTS))

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (5 by maintainers)

Most upvoted comments

Ok, I found the issue, I switched the base image from python:3.10-slim to python:3.10, the performance was 0.069724. I assume there might be an apt package that’s not present in python:3.10-slim that could be required? Could you guess at what that might be?

I am glad you managed to resolve this issue. I have no idea what package could be missing that would incur such a performance penalty.