OpenBLAS: sdot yields wrong results for odd sizes > 2^24 and all sizes > 2^29

When I compute a dot product of a vector of all ones with itself, the result should be the size of the vector. This starts to no longer hold at size 2^24 + 1 as the following program shows:

// file tmp.c
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "cblas.h"

int main(int argc, char **argv) {
    int exp = atoi(argv[1]);
    int offset = atoi(argv[2]);
    size_t size = (1 << exp) + offset;
    float *a = (float *) malloc(size * sizeof(float));
    assert(a != NULL);
    for (size_t i=0; i < size; ++i)
        a[i] = 1;

    float dot = cblas_sdot(size, a, 1, a, 1);
    printf("dot = %f\n", dot);
    printf("size = %zu = (1 << %i) + (%i)\n", size, exp, offset);

    return 0;
}

I get

$ ./tmp 24 0
dot = 16777216.000000
size = 16777216 = (1 << 24) + (0)

$ ./tmp 24 1
dot = 16777216.000000
size = 16777217 = (1 << 24) + (1)

$ ./tmp 24 2
dot = 16777218.000000
size = 16777218 = (1 << 24) + (2)

$ ./tmp 24 3
dot = 16777220.000000
size = 16777219 = (1 << 24) + (3)

$ ./tmp 24 4
dot = 16777220.000000
size = 16777220 = (1 << 24) + (4)

$ ./tmp 24 3
dot = 16777220.000000
size = 16777221 = (1 << 24) + (5)

and then, for sizes larger than 2^29 the result just stays constant at 2^29:

$ ./tmp 29 1
dot = 536870912.000000
size = 536870913 = (1 << 29) + (1)

$ ./tmp 29 2
dot = 536870912.000000
size = 536870914 = (1 << 29) + (2)

$ ./tmp 29 4
dot = 536870912.000000
size = 536870916 = (1 << 29) + (4)

$ ./tmp 30 -1
dot = 536870912.000000
size = 1073741823 = (1 << 30) + (-1)

$ ./tmp 30 0
dot = 536870912.000000
size = 1073741824 = (1 << 30) + (0)

I’m using the latest ~~master~~ develop version of OpenBLAS, and I statically link the library to not accidentally dynamically link an older version:

$ gcc -o tmp tmp.c [...]/OpenBLAS/build/lib/libopenblas.a -I[...]/OpenBLAS/build/include -std=c99
$ ldd tmp
        linux-vdso.so.1 (0x00007ffe2ed79000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f58b2fdc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f58b338c000)

Some more info:

$ uname -a
Linux sabayon 4.11.0-sabayon #1 SMP Wed Jul 26 12:57:06 UTC 2017 x86_64 Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz GenuineIntel GNU/Linux

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 65 (24 by maintainers)

Most upvoted comments

Seems to me that “float input, double accumulation and result” is what dsdot (cblas_dsdot in your case) was written for, perhaps worth a try.

martin-frbg on Oct 15, 2017

that yields parity difference a compiler loop vectorisation property. Ill try to get to clang-vs-gcc at least.

brada4 on Oct 12, 2017