OpenBLAS: Wrong results for small DTRMV on x86-64

Usually I use the POWER8 platform but today I found a bad error in the x86-64 code for the DTRMV routine.

I want to compute x <- A*x, where

  • A is nonunit upper triangular,
  • N = 1…64,
  • LDA = 64,
  • and INCX=1

and get wrong results on a Haswell (16 cores, Xeon CPU E5-2640 v3) based system using gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) if OpenBLAS is compiled via

make USE_OPENMP=1 

I compared the results with the Netlib implementation and obtained:

Correct RESULT: DTRMV(U,N,N,    1, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    2, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    3, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    4, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    5, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    6, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    7, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    8, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,    9, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   10, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   11, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   12, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   13, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   14, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   15, A,   64, X,     1)  MAXERR = .000000000000000D+00
Correct RESULT: DTRMV(U,N,N,   16, A,   64, X,     1)  MAXERR = .000000000000000D+00
Wrong RESULT  : DTRMV(U,N,N,   17, A,   64, X,     1)  MAXERR = .568434188608080D-13
Wrong RESULT  : DTRMV(U,N,N,   18, A,   64, X,     1)  MAXERR = .113686837721616D-12
Wrong RESULT  : DTRMV(U,N,N,   19, A,   64, X,     1)  MAXERR = .113686837721616D-12
Wrong RESULT  : DTRMV(U,N,N,   20, A,   64, X,     1)  MAXERR = .227373675443232D-12
Wrong RESULT  : DTRMV(U,N,N,   21, A,   64, X,     1)  MAXERR = .227373675443232D-12
Wrong RESULT  : DTRMV(U,N,N,   22, A,   64, X,     1)  MAXERR = .454747350886464D-12
Wrong RESULT  : DTRMV(U,N,N,   23, A,   64, X,     1)  MAXERR = .909494701772928D-12
Wrong RESULT  : DTRMV(U,N,N,   24, A,   64, X,     1)  MAXERR = .909494701772928D-12
Wrong RESULT  : DTRMV(U,N,N,   25, A,   64, X,     1)  MAXERR = .181898940354586D-11
Wrong RESULT  : DTRMV(U,N,N,   26, A,   64, X,     1)  MAXERR = .363797880709171D-11
Wrong RESULT  : DTRMV(U,N,N,   27, A,   64, X,     1)  MAXERR = .727595761418343D-11
Wrong RESULT  : DTRMV(U,N,N,   28, A,   64, X,     1)  MAXERR = .363797880709171D-11
Wrong RESULT  : DTRMV(U,N,N,   29, A,   64, X,     1)  MAXERR = .291038304567337D-10
Wrong RESULT  : DTRMV(U,N,N,   30, A,   64, X,     1)  MAXERR = .582076609134674D-10
Wrong RESULT  : DTRMV(U,N,N,   31, A,   64, X,     1)  MAXERR = .145519152283669D-10
Wrong RESULT  : DTRMV(U,N,N,   32, A,   64, X,     1)  MAXERR = .582076609134674D-10
Wrong RESULT  : DTRMV(U,N,N,   33, A,   64, X,     1)  MAXERR = .596598620663450D+06
Wrong RESULT  : DTRMV(U,N,N,   34, A,   64, X,     1)  MAXERR = .605368209769108D+06
Wrong RESULT  : DTRMV(U,N,N,   35, A,   64, X,     1)  MAXERR = .114251427897636D+07
Wrong RESULT  : DTRMV(U,N,N,   36, A,   64, X,     1)  MAXERR = .176011411015678D+07
Wrong RESULT  : DTRMV(U,N,N,   37, A,   64, X,     1)  MAXERR = .273520848158376D+07
Wrong RESULT  : DTRMV(U,N,N,   38, A,   64, X,     1)  MAXERR = .400535392194670D+07
Wrong RESULT  : DTRMV(U,N,N,   39, A,   64, X,     1)  MAXERR = .583437738276019D+07
Wrong RESULT  : DTRMV(U,N,N,   40, A,   64, X,     1)  MAXERR = .850833665131698D+07
Wrong RESULT  : DTRMV(U,N,N,   41, A,   64, X,     1)  MAXERR = .123493561359394D+08
Wrong RESULT  : DTRMV(U,N,N,   42, A,   64, X,     1)  MAXERR = .173391397036582D+08
Wrong RESULT  : DTRMV(U,N,N,   43, A,   64, X,     1)  MAXERR = .260484477387416D+08
Wrong RESULT  : DTRMV(U,N,N,   44, A,   64, X,     1)  MAXERR = .376086864422978D+08
Wrong RESULT  : DTRMV(U,N,N,   45, A,   64, X,     1)  MAXERR = .562860046618746D+08
Wrong RESULT  : DTRMV(U,N,N,   46, A,   64, X,     1)  MAXERR = .722429815166366D+08
Wrong RESULT  : DTRMV(U,N,N,   47, A,   64, X,     1)  MAXERR = .129301313570969D+09
Wrong RESULT  : DTRMV(U,N,N,   48, A,   64, X,     1)  MAXERR = .193137989170661D+09
Wrong RESULT  : DTRMV(U,N,N,   49, A,   64, X,     1)  MAXERR = .279222204411076D+09
Wrong RESULT  : DTRMV(U,N,N,   50, A,   64, X,     1)  MAXERR = .678662007793433D+09
Wrong RESULT  : DTRMV(U,N,N,   51, A,   64, X,     1)  MAXERR = .121269930403095D+10
Wrong RESULT  : DTRMV(U,N,N,   52, A,   64, X,     1)  MAXERR = .106155723183043D+10
Wrong RESULT  : DTRMV(U,N,N,   53, A,   64, X,     1)  MAXERR = .306408230139373D+10
Wrong RESULT  : DTRMV(U,N,N,   54, A,   64, X,     1)  MAXERR = .470622143023516D+10
Wrong RESULT  : DTRMV(U,N,N,   55, A,   64, X,     1)  MAXERR = .716298126556327D+10
Wrong RESULT  : DTRMV(U,N,N,   56, A,   64, X,     1)  MAXERR = .103315607770989D+11
Wrong RESULT  : DTRMV(U,N,N,   57, A,   64, X,     1)  MAXERR = .837779499609675D+10
Wrong RESULT  : DTRMV(U,N,N,   58, A,   64, X,     1)  MAXERR = .255092826195343D+11
Wrong RESULT  : DTRMV(U,N,N,   59, A,   64, X,     1)  MAXERR = .388799116737581D+11
Wrong RESULT  : DTRMV(U,N,N,   60, A,   64, X,     1)  MAXERR = .612787305842372D+11
Wrong RESULT  : DTRMV(U,N,N,   61, A,   64, X,     1)  MAXERR = .421813369205623D+11
Wrong RESULT  : DTRMV(U,N,N,   62, A,   64, X,     1)  MAXERR = .531064994597937D+11
Wrong RESULT  : DTRMV(U,N,N,   63, A,   64, X,     1)  MAXERR = .218143907859646D+12
Wrong RESULT  : DTRMV(U,N,N,   64, A,   64, X,     1)  MAXERR = .160600130355114D+12

The demonstration code is here: https://gist.github.com/grisuthedragon/32d8ad11e1d722414f921509b97d5507

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 68 (28 by maintainers)

Commits related to this issue

Most upvoted comments

I would suggest to deactivate the threading for this routine. It seems that nearly nobody uses it for large scale stuff where the threading gets important. Otherwise the error would be detected earlier and did not stay 10 years inside Goto/OpenBLAS.

I check how often it is used and LAPACK and there are only 12 routines calling it and in most cases N <= 64 and so we only have a small benefit from a multithreaded version.