OpenBLAS: DGEMM regression on SkylakeX

It looks like https://github.com/xianyi/OpenBLAS/commit/45fe8cb0c5d06f890913e86078cb48ac379c65dc has created a regression in Julia’s pinv() calculations on SkylakeX. In particular, creating a Hilbert matrix of size 1000 x 100 and asking for the pseudo-inverse now calculates the wrong thing:

using LinearAlgebra

function hilb(T::Type, m::Integer, n::Integer)
    a = Matrix{T}(undef, m, n)
    for i=1:n
        for j=1:m
            a[j,i]=one(T)/(i+j-one(T))
        end
    end
    return a
end
hilb(m::Integer, n::Integer) = hilb(Float64,m,n)

a = hilb(1000, 100)
apinv = pinv(a)

Including the SkylakeX kernel gives the following answer:

100×1000 Array{Float64,2}:
  2.57526e6   -2.33247e6    2.21848e6    2.19307e6   -4.13046e6   …  -4.71439e5   -6.80621e5   -6.56864e5   -8.6676e5    -3.86363e5
 -1.22338e11  -2.20992e11   2.36372e11  -1.14835e11  -9.1049e10       1.13475e10   8.51702e9    3.51379e9    1.54455e10   2.8167e8
  2.45922e11   3.06366e11  -3.45368e11   6.99788e10   1.34305e11     -2.0333e10   -1.72032e10  -6.35715e9   -3.15537e10   3.42079e9
 -1.98151e10  -5.04668e10   6.22131e9   -4.37235e10  -3.29137e9       2.4302e9     3.26304e9    2.4001e9     5.31362e9    1.42072e8
 -3.96966e11   1.59586e10  -1.05208e11  -3.27214e11   3.74498e11      5.9171e10    8.17795e10   7.30775e10   1.11105e11   3.58937e10
 -1.15417e11  -1.8089e10    3.02927e10  -7.71434e10   6.99771e10  …   1.55784e10   1.9823e10    1.69518e10   2.74274e10   8.50587e9
 -7.91383e11   3.19284e11  -5.46209e11  -6.27492e11   1.00493e12      1.26433e11   1.8369e11    1.69215e11   2.44706e11   8.48811e10
 -1.12133e11  -3.60367e10  -1.90203e8   -9.61073e10   7.40389e10      1.55288e10   2.07537e10   1.779e10     2.90644e10   7.91371e9
  4.19346e11  -2.37896e11   3.70455e11   3.44512e11  -6.0224e11      -6.9822e10   -1.03364e11  -9.66593e10  -1.36339e11  -4.94829e10
 -9.51913e9   -1.29776e11   1.82371e11   1.45922e10  -1.32419e11     -3.82507e9   -1.03485e10  -1.22032e10  -1.15361e10  -6.94914e9
  2.81647e12  -3.70604e11   9.33114e11   2.35652e12  -2.88098e12  …  -4.29963e11  -5.98574e11  -5.4097e11   -8.06547e11  -2.73127e11
  2.23042e11  -1.50478e11   1.87991e11   1.81127e11  -3.31279e11     -3.78415e10  -5.5537e10   -5.24234e10  -7.25081e10  -2.8155e10
  8.06723e11  -3.62461e11   6.16052e11   6.57412e11  -1.0719e12      -1.30794e11  -1.91569e11  -1.77425e11  -2.54474e11  -8.93792e10
  5.53103e10  -1.38396e10  -1.52335e10   3.90026e10  -4.70444e10     -8.49163e9   -1.06612e10  -9.72492e9   -1.3894e10   -6.12903e9
  9.43701e11  -5.29597e11   7.95135e11   7.22863e11  -1.31683e12     -1.54868e11  -2.28251e11  -2.12445e11  -3.0164e11   -1.08025e11
 -4.21561e11   1.74302e10  -1.05198e11  -3.6234e11    4.02288e11  …   6.38291e10   8.7825e10    7.88592e10   1.18726e11   3.97235e10
  ⋮                                                               ⋱   ⋮
  6.68906e10  -6.53051e9    3.21447e10   4.33472e10  -6.42927e10     -9.61282e9   -1.3713e10   -1.20454e10  -1.88794e10  -5.21562e9
 -4.3416e10    9.89228e9   -1.06994e10  -3.73798e10   4.63849e10  …   6.77796e9    9.31398e9    8.54109e9    1.23867e10   4.64291e9
 -3.36704e10  -2.19847e10   3.36634e10  -2.54336e10   4.56403e9       4.11065e9    4.50039e9    3.55012e9    6.46187e9    1.91442e9
  1.00752e11  -9.78859e10   1.5892e11    7.5809e10   -1.8612e11      -1.78414e10  -2.82659e10  -2.68784e10  -3.70109e10  -1.31011e10
 -6.24935e11   1.72017e11  -3.18826e11  -5.14607e11   7.21608e11      9.79871e10   1.39412e11   1.27482e11   1.86508e11   6.45552e10
 -3.56138e11   4.75922e10  -1.07088e11  -2.96353e11   3.60894e11      5.43011e10   7.52645e10   6.80322e10   1.01325e11   3.46671e10
  1.41477e11  -1.55157e11   2.42801e11   1.08982e11  -2.78805e11  …  -2.57232e10  -4.11459e10  -3.94458e10  -5.35713e10  -1.95093e10
 -1.71703e11   6.41864e9   -2.29926e10  -1.45472e11   1.56983e11      2.57486e10   3.4882e10    3.13021e10   4.71032e10   1.62219e10
 -1.57418e11  -2.50531e10   1.42196e10  -1.16675e11   1.07792e11      2.18507e10   2.86411e10   2.47383e10   3.9572e10    1.19948e10
 -5.09849e11   1.45783e11  -2.73511e11  -3.97073e11   5.85682e11      7.9147e10    1.13045e11   1.02973e11   1.5164e11    5.11554e10
 -2.01401e11   9.45476e10  -1.55585e11  -1.48556e11   2.6349e11       3.21347e10   4.71454e10   4.34308e10   6.28048e10   2.14634e10
  6.55703e11  -1.91764e11   3.65181e11   5.46763e11  -7.75768e11  …  -1.03493e11  -1.481e11    -1.35717e11  -1.97988e11  -6.84949e10
  4.39019e11  -7.20111e10   1.66746e11   3.76386e11  -4.6781e11      -6.78667e10  -9.50602e10  -8.63599e10  -1.27719e11  -4.38774e10
 -1.35314e11   1.43088e11  -2.17464e11  -8.70376e10   2.51535e11      2.36837e10   3.76675e10   3.57591e10   4.93057e10   1.72898e10
  2.95696e10   5.86712e10  -7.47343e10   2.73326e10   3.03537e10     -2.52465e9   -1.2501e9    -4.13464e7   -2.64329e9    3.99575e7
 -7.70318e10  -3.93945e10   4.41432e10  -8.97948e10   4.01424e10      1.12653e10   1.37315e10   1.20607e10   1.87392e10   7.00317e9

Excluding the SkylakeX kernel (e.g. reverting to 544b069e85254d41699afde16e2e81c123cb5f28) gives the result:

100×1000 Array{Float64,2}:
     112.527      -6192.3         1.06925e5   -8.28373e5    3.21394e6   -6.01292e6   …  -2.99287e5   -3.02032e5   -3.04795e5   -3.07576e5
   -6305.8            4.64899e5  -9.07773e6    7.54681e7   -3.07426e8    5.99356e8       3.28027e7    3.31027e7    3.34047e7    3.37085e7
       1.1309e5      -9.42656e6   1.9735e8    -1.71896e9    7.25526e9   -1.46068e10     -8.71604e8   -8.79551e8   -8.8755e8    -8.95596e8
      -9.32272e5      8.33527e7  -1.82785e9    1.64741e10  -7.1497e10    1.47819e11      9.57181e9    9.65882e9    9.74639e9    9.83447e9
       3.98657e6     -3.73868e8   8.48896e9   -7.86389e10   3.49436e11  -7.39605e11     -5.19571e10  -5.24279e10  -5.29016e10  -5.33781e10
      -8.8007e6       8.57783e8  -2.00715e10   1.90643e11  -8.66324e11   1.8764e12   …   1.44167e11   1.45468e11   1.46778e11   1.48094e11
       7.90418e6     -8.06081e8   1.95704e10  -1.91875e11   8.97794e11  -2.00535e12     -1.75621e11  -1.77197e11  -1.78783e11  -1.80377e11
       2.40961e6     -2.157e8     4.61896e9   -3.98835e10   1.62513e11  -3.0448e11      -1.76662e9   -1.79326e9   -1.82037e9   -1.84804e9
      -5.54279e6      5.75778e8  -1.41936e10   1.40992e11  -6.67618e11   1.50955e12      1.37796e11   1.39031e11   1.40274e11   1.41523e11
      -4.00904e6      3.9166e8   -9.13369e9    8.60798e10  -3.86441e11   8.21383e11      5.04267e10   5.08898e10   5.13561e10   5.18251e10
       1.63339e6     -1.8959e8    5.10176e9   -5.45604e10   2.76223e11  -6.69193e11  …  -8.18735e10  -8.25983e10  -8.33273e10  -8.40596e10
       4.57405e6     -4.75446e8   1.17311e10  -1.16639e11   5.52734e11  -1.25049e12     -1.12594e11  -1.13605e11  -1.14622e11  -1.15644e11
       3.29825e6     -3.27272e8   7.73331e9   -7.3721e10    3.34416e11  -7.18287e11     -4.43866e10  -4.47954e10  -4.52068e10  -4.56208e10
  -37289.1            2.26601e7  -9.77279e8    1.3636e10   -8.32615e10   2.3651e11       4.70926e10   4.75026e10   4.79148e10   4.83286e10
      -2.81123e6      3.03789e8  -7.75788e9    7.96192e10  -3.89208e11   9.11377e11      9.80715e10   9.89445e10   9.98226e10   1.00705e11
      -3.65969e6      3.80616e8  -9.39927e9    9.35385e10  -4.43642e11   1.00441e12  …   8.90641e10   8.98655e10   9.06717e10   9.14823e10
       ⋮                                                                 ⋮           ⋱
      -5.52909e5      6.12481e7  -1.61047e9    1.70918e10  -8.69055e10   2.14183e11      3.86899e10   3.90212e10   3.93541e10   3.96884e10
 -992608.0            1.08145e8  -2.79998e9    2.92745e10  -1.46579e11   3.54876e11  …   5.63476e10   5.68342e10   5.73232e10   5.78142e10
      -1.35946e6      1.47079e8  -3.78272e9    3.92898e10  -1.95373e11   4.69163e11      6.97768e10   7.03821e10   7.09905e10   7.16015e10
      -1.59876e6      1.72258e8  -4.41282e9    4.56535e10  -2.26067e11   5.40144e11      7.69398e10   7.76094e10   7.82824e10   7.89583e10
      -1.68423e6      1.80867e8  -4.61853e9    4.76281e10  -2.35036e11   5.59247e11      7.6759e10    7.74288e10   7.81023e10   7.87786e10
      -1.5861e6       1.69768e8  -4.32122e9    4.44178e10  -2.18431e11   5.17537e11      6.82601e10   6.88577e10   6.94585e10   7.0062e10
      -1.28093e6      1.36538e8  -3.46127e9    3.54304e10  -1.73446e11   4.08663e11  …   5.09162e10   5.13641e10   5.18145e10   5.2267e10
      -7.85551e5      8.29992e7  -2.08563e9    2.1155e10   -1.02525e11   2.38529e11      2.56914e10   2.59206e10   2.61511e10   2.63827e10
      -1.22319e5      1.1641e7   -2.59973e8    2.29043e9   -9.22435e9    1.59028e10     -5.87688e9   -5.92236e9   -5.96796e9   -6.01363e9
       6.34345e5     -6.95395e7   1.8113e9    -1.90533e10   9.60279e10  -2.3435e11      -4.02598e10  -4.06052e10  -4.09522e10  -4.13007e10
       1.37734e6     -1.49026e8   3.83373e9   -3.98355e10   1.98204e11  -4.76404e11     -7.24162e10  -7.30427e10  -7.36725e10  -7.43049e10
       1.94231e6     -2.09213e8   5.35875e9   -5.54398e10   2.74569e11  -6.56287e11  …  -9.49885e10  -9.58134e10  -9.66426e10  -9.74753e10
       2.11244e6     -2.26877e8   5.79482e9   -5.97807e10   2.95167e11  -7.02918e11     -9.83543e10  -9.92107e10  -1.00072e11  -1.00936e11
       1.59804e6     -1.71043e8   4.3541e9    -4.47652e10   2.20217e11  -5.22078e11     -6.99398e10  -7.0551e10   -7.11654e10  -7.17825e10
   23549.4           -1.60268e6   1.77704e7    5.98452e7   -1.59272e9    7.57578e9       6.25708e9    6.3078e9     6.35871e9    6.40975e9
      -3.07648e6      3.31117e8  -8.47524e9    8.76252e10  -4.33698e11   1.03595e12      1.49876e11   1.51177e11   1.52485e11   1.53798e11

Note that the pinv() definition is using SVD internally, so this is turning into an LAPACK.gesdd() call, which is itself giving very different answers, so this should be easy to reproduce locally by passing a Hilbert matrix of the above dimensions in through whichever interface you wish to dgesdd.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 85 (50 by maintainers)

Commits related to this issue

Most upvoted comments

Fixed now by wjc404’s new AVX512 DGEMM kernel from #2286

Not surprisingly, that hacked version performs markedly worse in the DGEMM benchmark than the regular Haswell microkernel, now that I have the hardware to test this.

I work for Intel… in my cube alone I have 3 skylakeX’s 😉

I just need code that shows it go wrong so that I can debug and fix

On Thu, Jan 24, 2019 at 1:43 PM Elliot Saba notifications@github.com wrote:

@fenrus75 https://github.com/fenrus75 do you need access to a SkylakeX machine? I can get you SSH access on a Linux SkylakeX machine if that would help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xianyi/OpenBLAS/issues/1955#issuecomment-457368306, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPeFf1n8wk3zqRJSIL5VMfdpnUzWEBVks5vGikagaJpZM4Z2an- .