numba: `np.cos`, `np.sin`, `np.sqrt` returning incorrect values in very specific scenarios
Reporting a bug
- I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
- I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.
Consider this innocuous looking snippet:
import numpy as np
import numba as nb
@nb.njit
def f(x):
return np.array([
x[0],
np.cos(x[0]) * x[0],
-np.cos(x[0]) * np.cos(x[1])
])
print(f(np.zeros(2))) # [ 0. -0. -0.99254141]
On Windows, Python 3.9.5 and Python 3.8.2, I confirm this starts producing incorrect results starting from numba==0.54.0 and later, including numba==0.56.4. It does not occur on numba==0.53.1 or earlier.
Doing a diff of the LLVM dump between passing and failing version seems to suggest it has something to do with the usage of the svml library, but I can’t say for sure because any tiny change to the code can cause the issue to go away. For example, the following snippet has no issue:
import numpy as np
import numba as nb
@nb.njit
def f(x):
return np.array([
x[1],
np.cos(x[0]) * x[0],
-np.cos(x[0]) * np.cos(x[1])
])
print(f(np.zeros(2))) # [ 0. 0. -1.]
Here is my environment:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : ...
UTC start time : ...
Running time (s) : 1.805322
__Hardware Information__
Machine : AMD64
CPU Name : skylake
CPU Count : 8
Number of accessible CPUs : 8
List of accessible CPUs cores : 0 1 2 3 4 5 6 7
CFS Restrictions (CPUs worth of runtime) : None
CPU Features : 64bit adx aes avx avx2 bmi bmi2
clflushopt cmov cx16 cx8 f16c fma
fsgsbase fxsr invpcid lzcnt mmx
movbe pclmul popcnt prfchw rdrnd
rdseed rtm sahf sse sse2 sse3
sse4.1 sse4.2 ssse3 xsave xsavec
xsaveopt xsaves
Memory Total (MB) : 32708
Memory Available (MB) : 16987
__OS Information__
Platform Name : Windows-10-10.0.19045-SP0
Platform Release : 10
OS Name : Windows
OS Version : 10.0.19045
OS Specific Version : 10 10.0.19045 SP0 Multiprocessor Free
Libc Version : ?
__Python Information__
Python Compiler : MSC v.1928 64 bit (AMD64)
Python Implementation : CPython
Python Version : 3.9.5
Python Locale : ...
__Numba Toolchain Versions__
Numba Version : 0.56.4
llvmlite Version : 0.39.1
__LLVM Information__
LLVM Version : 11.1.0
__CUDA Information__
CUDA Device Initialized : True
CUDA Driver Version : 11.7
CUDA Runtime Version : 9.0
CUDA NVIDIA Bindings Available : False
CUDA NVIDIA Bindings In Use : False
CUDA Detect Output:
Found 1 CUDA devices
id 0 b'NVIDIA GeForce GTX 1070' [SUPPORTED]
Compute Capability: 6.1
PCI Device ID: 0
PCI Bus ID: 1
UUID: GPU-a52653e9-3c51-fa78-7d15-673f8058ea38
Watchdog: Enabled
Compute Mode: WDDM
FP32/FP64 Performance Ratio: 32
Summary:
1/1 devices are supported
CUDA Libraries Test Output:
Finding nvvm from CUDA_HOME
named nvvm64_32_0.dll
trying to open library... ok
Finding cudart from CUDA_HOME
named cudart64_90.dll
trying to open library... ok
Finding cudadevrt from CUDA_HOME
named cudadevrt.lib
ERROR: failed to find cudadevrt:
cudadevrt.lib not found
Finding libdevice from CUDA_HOME
trying to open library... ok
__NumPy Information__
NumPy Version : 1.20.3
NumPy Supported SIMD features : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL')
NumPy Supported SIMD baseline : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected : False
__SVML Information__
SVML State, config.USING_SVML : True
SVML Library Loaded : True
llvmlite Using SVML Patched LLVM : True
SVML Operational : True
__Threading Layer Information__
TBB Threading Layer Available : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available : True
+-->Vendor: MS
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda not available.
__Installed Packages__
Package Version
------------------------------ --------------
llvmlite 0.39.1
numba 0.56.4
numpy 1.20.3
scipy 1.8.1
...
No errors reported.
__Warning log__
Warning: Conda not available.
Error was [WinError 2] The system cannot find the file specified
--------------------------------------------------------------------------------
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 19 (11 by maintainers)
Since this issue is no longer reproducible on LLVM14, I think we can close it.
As for the new SVML patch (https://github.com/numba/llvmlite/pull/947), we don’t have anyway to test it.
I can reproduce similar behaviour on Windows:
I’m also using a skylake CPU: