OpenBLAS: DGEMM regression on SkylakeX
It looks like https://github.com/xianyi/OpenBLAS/commit/45fe8cb0c5d06f890913e86078cb48ac379c65dc has created a regression in Julia’s pinv() calculations on SkylakeX. In particular, creating a Hilbert matrix of size 1000 x 100 and asking for the pseudo-inverse now calculates the wrong thing:
using LinearAlgebra
function hilb(T::Type, m::Integer, n::Integer)
a = Matrix{T}(undef, m, n)
for i=1:n
for j=1:m
a[j,i]=one(T)/(i+j-one(T))
end
end
return a
end
hilb(m::Integer, n::Integer) = hilb(Float64,m,n)
a = hilb(1000, 100)
apinv = pinv(a)
Including the SkylakeX kernel gives the following answer:
100×1000 Array{Float64,2}:
2.57526e6 -2.33247e6 2.21848e6 2.19307e6 -4.13046e6 … -4.71439e5 -6.80621e5 -6.56864e5 -8.6676e5 -3.86363e5
-1.22338e11 -2.20992e11 2.36372e11 -1.14835e11 -9.1049e10 1.13475e10 8.51702e9 3.51379e9 1.54455e10 2.8167e8
2.45922e11 3.06366e11 -3.45368e11 6.99788e10 1.34305e11 -2.0333e10 -1.72032e10 -6.35715e9 -3.15537e10 3.42079e9
-1.98151e10 -5.04668e10 6.22131e9 -4.37235e10 -3.29137e9 2.4302e9 3.26304e9 2.4001e9 5.31362e9 1.42072e8
-3.96966e11 1.59586e10 -1.05208e11 -3.27214e11 3.74498e11 5.9171e10 8.17795e10 7.30775e10 1.11105e11 3.58937e10
-1.15417e11 -1.8089e10 3.02927e10 -7.71434e10 6.99771e10 … 1.55784e10 1.9823e10 1.69518e10 2.74274e10 8.50587e9
-7.91383e11 3.19284e11 -5.46209e11 -6.27492e11 1.00493e12 1.26433e11 1.8369e11 1.69215e11 2.44706e11 8.48811e10
-1.12133e11 -3.60367e10 -1.90203e8 -9.61073e10 7.40389e10 1.55288e10 2.07537e10 1.779e10 2.90644e10 7.91371e9
4.19346e11 -2.37896e11 3.70455e11 3.44512e11 -6.0224e11 -6.9822e10 -1.03364e11 -9.66593e10 -1.36339e11 -4.94829e10
-9.51913e9 -1.29776e11 1.82371e11 1.45922e10 -1.32419e11 -3.82507e9 -1.03485e10 -1.22032e10 -1.15361e10 -6.94914e9
2.81647e12 -3.70604e11 9.33114e11 2.35652e12 -2.88098e12 … -4.29963e11 -5.98574e11 -5.4097e11 -8.06547e11 -2.73127e11
2.23042e11 -1.50478e11 1.87991e11 1.81127e11 -3.31279e11 -3.78415e10 -5.5537e10 -5.24234e10 -7.25081e10 -2.8155e10
8.06723e11 -3.62461e11 6.16052e11 6.57412e11 -1.0719e12 -1.30794e11 -1.91569e11 -1.77425e11 -2.54474e11 -8.93792e10
5.53103e10 -1.38396e10 -1.52335e10 3.90026e10 -4.70444e10 -8.49163e9 -1.06612e10 -9.72492e9 -1.3894e10 -6.12903e9
9.43701e11 -5.29597e11 7.95135e11 7.22863e11 -1.31683e12 -1.54868e11 -2.28251e11 -2.12445e11 -3.0164e11 -1.08025e11
-4.21561e11 1.74302e10 -1.05198e11 -3.6234e11 4.02288e11 … 6.38291e10 8.7825e10 7.88592e10 1.18726e11 3.97235e10
⋮ ⋱ ⋮
6.68906e10 -6.53051e9 3.21447e10 4.33472e10 -6.42927e10 -9.61282e9 -1.3713e10 -1.20454e10 -1.88794e10 -5.21562e9
-4.3416e10 9.89228e9 -1.06994e10 -3.73798e10 4.63849e10 … 6.77796e9 9.31398e9 8.54109e9 1.23867e10 4.64291e9
-3.36704e10 -2.19847e10 3.36634e10 -2.54336e10 4.56403e9 4.11065e9 4.50039e9 3.55012e9 6.46187e9 1.91442e9
1.00752e11 -9.78859e10 1.5892e11 7.5809e10 -1.8612e11 -1.78414e10 -2.82659e10 -2.68784e10 -3.70109e10 -1.31011e10
-6.24935e11 1.72017e11 -3.18826e11 -5.14607e11 7.21608e11 9.79871e10 1.39412e11 1.27482e11 1.86508e11 6.45552e10
-3.56138e11 4.75922e10 -1.07088e11 -2.96353e11 3.60894e11 5.43011e10 7.52645e10 6.80322e10 1.01325e11 3.46671e10
1.41477e11 -1.55157e11 2.42801e11 1.08982e11 -2.78805e11 … -2.57232e10 -4.11459e10 -3.94458e10 -5.35713e10 -1.95093e10
-1.71703e11 6.41864e9 -2.29926e10 -1.45472e11 1.56983e11 2.57486e10 3.4882e10 3.13021e10 4.71032e10 1.62219e10
-1.57418e11 -2.50531e10 1.42196e10 -1.16675e11 1.07792e11 2.18507e10 2.86411e10 2.47383e10 3.9572e10 1.19948e10
-5.09849e11 1.45783e11 -2.73511e11 -3.97073e11 5.85682e11 7.9147e10 1.13045e11 1.02973e11 1.5164e11 5.11554e10
-2.01401e11 9.45476e10 -1.55585e11 -1.48556e11 2.6349e11 3.21347e10 4.71454e10 4.34308e10 6.28048e10 2.14634e10
6.55703e11 -1.91764e11 3.65181e11 5.46763e11 -7.75768e11 … -1.03493e11 -1.481e11 -1.35717e11 -1.97988e11 -6.84949e10
4.39019e11 -7.20111e10 1.66746e11 3.76386e11 -4.6781e11 -6.78667e10 -9.50602e10 -8.63599e10 -1.27719e11 -4.38774e10
-1.35314e11 1.43088e11 -2.17464e11 -8.70376e10 2.51535e11 2.36837e10 3.76675e10 3.57591e10 4.93057e10 1.72898e10
2.95696e10 5.86712e10 -7.47343e10 2.73326e10 3.03537e10 -2.52465e9 -1.2501e9 -4.13464e7 -2.64329e9 3.99575e7
-7.70318e10 -3.93945e10 4.41432e10 -8.97948e10 4.01424e10 1.12653e10 1.37315e10 1.20607e10 1.87392e10 7.00317e9
Excluding the SkylakeX kernel (e.g. reverting to 544b069e85254d41699afde16e2e81c123cb5f28) gives the result:
100×1000 Array{Float64,2}:
112.527 -6192.3 1.06925e5 -8.28373e5 3.21394e6 -6.01292e6 … -2.99287e5 -3.02032e5 -3.04795e5 -3.07576e5
-6305.8 4.64899e5 -9.07773e6 7.54681e7 -3.07426e8 5.99356e8 3.28027e7 3.31027e7 3.34047e7 3.37085e7
1.1309e5 -9.42656e6 1.9735e8 -1.71896e9 7.25526e9 -1.46068e10 -8.71604e8 -8.79551e8 -8.8755e8 -8.95596e8
-9.32272e5 8.33527e7 -1.82785e9 1.64741e10 -7.1497e10 1.47819e11 9.57181e9 9.65882e9 9.74639e9 9.83447e9
3.98657e6 -3.73868e8 8.48896e9 -7.86389e10 3.49436e11 -7.39605e11 -5.19571e10 -5.24279e10 -5.29016e10 -5.33781e10
-8.8007e6 8.57783e8 -2.00715e10 1.90643e11 -8.66324e11 1.8764e12 … 1.44167e11 1.45468e11 1.46778e11 1.48094e11
7.90418e6 -8.06081e8 1.95704e10 -1.91875e11 8.97794e11 -2.00535e12 -1.75621e11 -1.77197e11 -1.78783e11 -1.80377e11
2.40961e6 -2.157e8 4.61896e9 -3.98835e10 1.62513e11 -3.0448e11 -1.76662e9 -1.79326e9 -1.82037e9 -1.84804e9
-5.54279e6 5.75778e8 -1.41936e10 1.40992e11 -6.67618e11 1.50955e12 1.37796e11 1.39031e11 1.40274e11 1.41523e11
-4.00904e6 3.9166e8 -9.13369e9 8.60798e10 -3.86441e11 8.21383e11 5.04267e10 5.08898e10 5.13561e10 5.18251e10
1.63339e6 -1.8959e8 5.10176e9 -5.45604e10 2.76223e11 -6.69193e11 … -8.18735e10 -8.25983e10 -8.33273e10 -8.40596e10
4.57405e6 -4.75446e8 1.17311e10 -1.16639e11 5.52734e11 -1.25049e12 -1.12594e11 -1.13605e11 -1.14622e11 -1.15644e11
3.29825e6 -3.27272e8 7.73331e9 -7.3721e10 3.34416e11 -7.18287e11 -4.43866e10 -4.47954e10 -4.52068e10 -4.56208e10
-37289.1 2.26601e7 -9.77279e8 1.3636e10 -8.32615e10 2.3651e11 4.70926e10 4.75026e10 4.79148e10 4.83286e10
-2.81123e6 3.03789e8 -7.75788e9 7.96192e10 -3.89208e11 9.11377e11 9.80715e10 9.89445e10 9.98226e10 1.00705e11
-3.65969e6 3.80616e8 -9.39927e9 9.35385e10 -4.43642e11 1.00441e12 … 8.90641e10 8.98655e10 9.06717e10 9.14823e10
⋮ ⋮ ⋱
-5.52909e5 6.12481e7 -1.61047e9 1.70918e10 -8.69055e10 2.14183e11 3.86899e10 3.90212e10 3.93541e10 3.96884e10
-992608.0 1.08145e8 -2.79998e9 2.92745e10 -1.46579e11 3.54876e11 … 5.63476e10 5.68342e10 5.73232e10 5.78142e10
-1.35946e6 1.47079e8 -3.78272e9 3.92898e10 -1.95373e11 4.69163e11 6.97768e10 7.03821e10 7.09905e10 7.16015e10
-1.59876e6 1.72258e8 -4.41282e9 4.56535e10 -2.26067e11 5.40144e11 7.69398e10 7.76094e10 7.82824e10 7.89583e10
-1.68423e6 1.80867e8 -4.61853e9 4.76281e10 -2.35036e11 5.59247e11 7.6759e10 7.74288e10 7.81023e10 7.87786e10
-1.5861e6 1.69768e8 -4.32122e9 4.44178e10 -2.18431e11 5.17537e11 6.82601e10 6.88577e10 6.94585e10 7.0062e10
-1.28093e6 1.36538e8 -3.46127e9 3.54304e10 -1.73446e11 4.08663e11 … 5.09162e10 5.13641e10 5.18145e10 5.2267e10
-7.85551e5 8.29992e7 -2.08563e9 2.1155e10 -1.02525e11 2.38529e11 2.56914e10 2.59206e10 2.61511e10 2.63827e10
-1.22319e5 1.1641e7 -2.59973e8 2.29043e9 -9.22435e9 1.59028e10 -5.87688e9 -5.92236e9 -5.96796e9 -6.01363e9
6.34345e5 -6.95395e7 1.8113e9 -1.90533e10 9.60279e10 -2.3435e11 -4.02598e10 -4.06052e10 -4.09522e10 -4.13007e10
1.37734e6 -1.49026e8 3.83373e9 -3.98355e10 1.98204e11 -4.76404e11 -7.24162e10 -7.30427e10 -7.36725e10 -7.43049e10
1.94231e6 -2.09213e8 5.35875e9 -5.54398e10 2.74569e11 -6.56287e11 … -9.49885e10 -9.58134e10 -9.66426e10 -9.74753e10
2.11244e6 -2.26877e8 5.79482e9 -5.97807e10 2.95167e11 -7.02918e11 -9.83543e10 -9.92107e10 -1.00072e11 -1.00936e11
1.59804e6 -1.71043e8 4.3541e9 -4.47652e10 2.20217e11 -5.22078e11 -6.99398e10 -7.0551e10 -7.11654e10 -7.17825e10
23549.4 -1.60268e6 1.77704e7 5.98452e7 -1.59272e9 7.57578e9 6.25708e9 6.3078e9 6.35871e9 6.40975e9
-3.07648e6 3.31117e8 -8.47524e9 8.76252e10 -4.33698e11 1.03595e12 1.49876e11 1.51177e11 1.52485e11 1.53798e11
Note that the pinv() definition is using SVD internally, so this is turning into an LAPACK.gesdd() call, which is itself giving very different answers, so this should be easy to reproduce locally by passing a Hilbert matrix of the above dimensions in through whichever interface you wish to dgesdd.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 85 (50 by maintainers)
Commits related to this issue
- Add patch to work around OpenBLAS v0.3.5 SkylakeX problems X-ref: https://github.com/JuliaLang/julia/pull/30583 X-ref: https://github.com/xianyi/OpenBLAS/issues/1955 — committed to JuliaLang/julia by staticfloat 5 years ago
- Add patch to work around OpenBLAS v0.3.5 SkylakeX problems (#30661) X-ref: https://github.com/JuliaLang/julia/pull/30583 X-ref: https://github.com/xianyi/OpenBLAS/issues/1955 — committed to JuliaLang/julia by staticfloat 5 years ago
- openblas: turn off AVX512 optimizations The AVX512 optimization has a bug in openblas-0.3.5 and can also lead to slower code on Xeon Silver CPUs. See https://github.com/xianyi/OpenBLAS/issues/1955 a... — committed to markuskowa/nixpkgs by markuskowa 5 years ago
- openblas: turn off AVX512 optimizations The AVX512 optimization has a bug in openblas-0.3.5 and can also lead to slower code on Xeon Silver CPUs. See https://github.com/xianyi/OpenBLAS/issues/1955 a... — committed to NixOS/nixpkgs by markuskowa 5 years ago
- openblas: turn off AVX512 optimizations The AVX512 optimization has a bug in openblas-0.3.5 and can also lead to slower code on Xeon Silver CPUs. See https://github.com/xianyi/OpenBLAS/issues/1955 a... — committed to NixOS/nixpkgs by markuskowa 5 years ago
- rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename op... — committed to TiborGY/OpenBLAS by TiborGY 5 years ago
Fixed now by wjc404’s new AVX512 DGEMM kernel from #2286
Not surprisingly, that hacked version performs markedly worse in the DGEMM benchmark than the regular Haswell microkernel, now that I have the hardware to test this.
I work for Intel… in my cube alone I have 3 skylakeX’s 😉
I just need code that shows it go wrong so that I can debug and fix
On Thu, Jan 24, 2019 at 1:43 PM Elliot Saba notifications@github.com wrote: