numba: `np.cos`, `np.sin`, `np.sqrt` returning incorrect values in very specific scenarios

Reporting a bug

  • I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
  • I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.

Consider this innocuous looking snippet:

import numpy as np
import numba as nb

@nb.njit
def f(x):
    return np.array([
        x[0],
        np.cos(x[0]) * x[0],
        -np.cos(x[0]) * np.cos(x[1])
    ])

print(f(np.zeros(2))) # [ 0.         -0.         -0.99254141]

On Windows, Python 3.9.5 and Python 3.8.2, I confirm this starts producing incorrect results starting from numba==0.54.0 and later, including numba==0.56.4. It does not occur on numba==0.53.1 or earlier. Doing a diff of the LLVM dump between passing and failing version seems to suggest it has something to do with the usage of the svml library, but I can’t say for sure because any tiny change to the code can cause the issue to go away. For example, the following snippet has no issue:

import numpy as np
import numba as nb

@nb.njit
def f(x):
    return np.array([
        x[1],
        np.cos(x[0]) * x[0],
        -np.cos(x[0]) * np.cos(x[1])
    ])

print(f(np.zeros(2))) # [ 0.  0. -1.]

Here is my environment:

--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : ...
UTC start time                                : ...
Running time (s)                              : 1.805322

__Hardware Information__
Machine                                       : AMD64
CPU Name                                      : skylake
CPU Count                                     : 8
Number of accessible CPUs                     : 8
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 bmi bmi2  
                                                clflushopt cmov cx16 cx8 f16c fma
                                                fsgsbase fxsr invpcid lzcnt mmx  
                                                movbe pclmul popcnt prfchw rdrnd 
                                                rdseed rtm sahf sse sse2 sse3    
                                                sse4.1 sse4.2 ssse3 xsave xsavec 
                                                xsaveopt xsaves

Memory Total (MB)                             : 32708
Memory Available (MB)                         : 16987

__OS Information__
Platform Name                                 : Windows-10-10.0.19045-SP0        
Platform Release                              : 10
OS Name                                       : Windows
OS Version                                    : 10.0.19045
OS Specific Version                           : 10 10.0.19045 SP0 Multiprocessor Free
Libc Version                                  : ?

__Python Information__
Python Compiler                               : MSC v.1928 64 bit (AMD64)
Python Implementation                         : CPython
Python Version                                : 3.9.5
Python Locale                                 : ...

__Numba Toolchain Versions__
Numba Version                                 : 0.56.4
llvmlite Version                              : 0.39.1

__LLVM Information__
LLVM Version                                  : 11.1.0

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : 11.7
CUDA Runtime Version                          : 9.0
CUDA NVIDIA Bindings Available                : False
CUDA NVIDIA Bindings In Use                   : False
CUDA Detect Output:
Found 1 CUDA devices
id 0    b'NVIDIA GeForce GTX 1070'                              [SUPPORTED]
                      Compute Capability: 6.1
                           PCI Device ID: 0
                              PCI Bus ID: 1
                                    UUID: GPU-a52653e9-3c51-fa78-7d15-673f8058ea38
                                Watchdog: Enabled
                            Compute Mode: WDDM
             FP32/FP64 Performance Ratio: 32
Summary:
        1/1 devices are supported

CUDA Libraries Test Output:
Finding nvvm from CUDA_HOME
        named  nvvm64_32_0.dll
        trying to open library...       ok
Finding cudart from CUDA_HOME
        named  cudart64_90.dll
        trying to open library...       ok
Finding cudadevrt from CUDA_HOME
        named  cudadevrt.lib
        ERROR: failed to find cudadevrt:
cudadevrt.lib not found
Finding libdevice from CUDA_HOME
        trying to open library...       ok


__NumPy Information__
NumPy Version                                 : 1.20.3
NumPy Supported SIMD features                 : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch                 : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL')
NumPy Supported SIMD baseline                 : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected             : False

__SVML Information__
SVML State, config.USING_SVML                 : True
SVML Library Loaded                           : True
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : True

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: MS
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda not available.

__Installed Packages__
Package                        Version
------------------------------ --------------
llvmlite                       0.39.1
numba                          0.56.4
numpy                          1.20.3
scipy                          1.8.1
...

No errors reported.


__Warning log__
Warning: Conda not available.
 Error was [WinError 2] The system cannot find the file specified

--------------------------------------------------------------------------------

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 19 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Since this issue is no longer reproducible on LLVM14, I think we can close it.

As for the new SVML patch (https://github.com/numba/llvmlite/pull/947), we don’t have anyway to test it.

I can reproduce similar behaviour on Windows:

> python.exe .\repro.py
[ 0.         -0.         -0.99254141]

> $env:NUMBA_DISABLE_INTEL_SVML = 1

> python.exe .\repro.py
[ 0.  0. -1.]

I’m also using a skylake CPU:

System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2023-01-12 14:10:00.832167
UTC start time                                : 2023-01-12 14:10:00.832167
Running time (s)                              : 1.175411

__Hardware Information__
Machine                                       : AMD64
CPU Name                                      : skylake
CPU Count                                     : 8
Number of accessible CPUs                     : ?
List of accessible CPUs cores                 : ?
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 bmi bmi2
                                                clflushopt cmov cx16 cx8 f16c fma
                                                fsgsbase fxsr invpcid lzcnt mmx
                                                movbe pclmul popcnt prfchw rdrnd
                                                rdseed sahf sgx sse sse2 sse3
                                                sse4.1 sse4.2 ssse3 xsave xsavec
                                                xsaveopt xsaves

Memory Total (MB)                             : 43772
Memory Available (MB)                         : 21288

__OS Information__
Platform Name                                 : Windows-10-10.0.22000-SP0
Platform Release                              : 10
OS Name                                       : Windows
OS Version                                    : 10.0.22000
OS Specific Version                           : 10 10.0.22000 SP0 Multiprocessor Free
Libc Version                                  : ?

__Python Information__
Python Compiler                               : MSC v.1916 64 bit (AMD64)
Python Implementation                         : CPython
Python Version                                : 3.9.5
Python Locale                                 : en_GB.cp1252

__Numba Toolchain Versions__
Numba Version                                 : 0.56.4
llvmlite Version                              : 0.39.1

__LLVM Information__
LLVM Version                                  : 11.1.0

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None

__NumPy Information__
NumPy Version                                 : 1.23.5
NumPy Supported SIMD features                 : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch                 : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL')
NumPy Supported SIMD baseline                 : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected             : False

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : True
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: MS
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
NUMBA_DISABLE_INTEL_SVML                      : 1
NUMBA_USING_SVML                              : 0

__Conda Information__
Conda not available.

__Installed Packages__
Package      Version
------------ ---------
certifi      2022.12.7
llvmlite     0.39.1
mkl-fft      1.3.1
mkl-random   1.2.2
mkl-service  2.4.0
numba        0.56.4
numpy        1.23.5
pip          22.3.1
setuptools   65.6.3
six          1.16.0
wheel        0.37.1
wincertstore 0.2

No errors reported.