cudf: [BUG] test_reductions.py test case failures

Describe the bug test_reductions.py test case failures in cuDF Build with commit level: 2cf8a535831dc555020f9668f3b8fd1dc8fb4dcb.

Steps/Code to reproduce bug

site-packages/cudf/tests$ pytest -v test_reductions.py

FAILED tests: 
test_reductions.py::test_sum[int8-2] FAILED
test_reductions.py::test_sum[int8-200] FAILED                                                                                                         [ 18%]
test_reductions.py::test_sum[int8-10000] FAILED                                                                                                       [ 19%]

test_reductions.py::test_min[int8-2] FAILED                                                                                                           [ 74%]
test_reductions.py::test_min[int8-3] FAILED                                                                                                           [ 74%]
test_reductions.py::test_min[int8-127] FAILED                                                                                                         [ 75%]
test_reductions.py::test_min[int8-128] FAILED                                                                                                         [ 75%]
test_reductions.py::test_min[int8-129] FAILED                                                                                                         [ 76%]
test_reductions.py::test_min[int8-200] FAILED                                                                                                         [ 76%]
test_reductions.py::test_min[int8-10000] FAILED

Expected behavior test_reductions.py tests cases should work fine.

Environment details (please complete the following information): cuDF build on commit:2cf8a535831dc555020f9668f3b8fd1dc8fb4dcb. using GCC 7.3.0 conda tool chain on Ubuntu 18.04 (ppc64le)

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (15 by maintainers)

Most upvoted comments

Aha! Good detective work @pradghos. Feel free to open a PR with your fix and link this issue.

jrhemstad on May 28, 2019

I think I am able to find the fix for the problem -

file - reduce.pyx in function apply_reduce is basically handling all the reduce operations-

    with nogil:
        c_result = reduction(
            <gdf_column*>c_col,
            c_op,
            c_out_dtype
            )

    free(c_col)
    result = get_scalar_value(c_result)

get_scalar_value() is returning GDF_INT8 scalar data which is nothing but char in ctypedef union gdf_data.

cdef get_scalar_value(gdf_scalar scalar):
    """
    Returns typed value from a gdf_scalar
    0-dim array is retuned if dtype is date32/64, timestamp
    """
    print("get_scalar_value called")
    return {
        GDF_FLOAT64: scalar.data.fp64,
        GDF_FLOAT32: scalar.data.fp32,
        GDF_INT64:   scalar.data.si64,
        GDF_INT32:   scalar.data.si32,
        GDF_INT16:   scalar.data.si16,
        GDF_INT8:    scalar.data.si08,    ===============>>> its `char`
        GDF_BOOL8:   np.array(scalar.data.b08).astype(np.bool_),
        GDF_DATE32:  np.array(scalar.data.dt32).astype('datetime64[D]'),
        GDF_DATE64:  np.array(scalar.data.dt64).astype('datetime64[ms]'),
        GDF_TIMESTAMP: np.array(scalar.data.tmst).astype('datetime64[ns]'),
    }[scalar.dtype]

In POWER where char is interpreted as unsigned char. below fixes worked fine -

diff --git a/python/cudf/bindings/cudf_cpp.pxd b/python/cudf/bindings/cudf_cpp.pxd
index a5c4970..a4631d4 100644
--- a/python/cudf/bindings/cudf_cpp.pxd
+++ b/python/cudf/bindings/cudf_cpp.pxd
@@ -172,7 +172,7 @@ cdef extern from "cudf.h" nogil:
         GDF_WINDOW_VA

     ctypedef union gdf_data:
-        char          si08
+        signed char   si08
         short         si16
         int           si32
         long          si64

I saw similar problem in libcudf also -

diff --git a/cpp/include/cudf/types.h b/cpp/include/cudf/types.h
index 22f118b..c1a2df0 100644
--- a/cpp/include/cudf/types.h
+++ b/cpp/include/cudf/types.h
@@ -107,7 +107,7 @@ typedef struct {
  */
 // TODO: #1119 Use traits to set `gdf_data` elements
 typedef union {
-  char          si08;  /**< GDF_INT8      */
+  signed char   si08;  /**< GDF_INT8      */
   short         si16;  /**< GDF_INT16     */
   int           si32;  /**< GDF_INT32     */
   long          si64;  /**< GDF_INT64     */

@jrhemstad @harrism @kkraus14 : Pls let me know if I can create PR for the same.

pradghos on May 28, 2019

I see that all the failures are on the numpy.int8 type. That sounds suspiciously similar to the narrowing warnings we saw in https://github.com/rapidsai/cudf/issues/1544 caused by the difference in how x86 vs POWER interpret char.

I’d double check what gdf_dtype is being passed into libcudf for the numpy.int8 column.

E       assert -32 == 224
E         --32
E         +224

224 == 11100000 (unsigned)
-32 == 11100000 (signed twos-complement)

This looks to me like the “expected” result is being interpreted as a signed integer, and the “actual” result is being interpreted as unsigned, because they have the same binary representation.

jrhemstad on May 17, 2019

I’m looking the issue.

kovaltan on May 13, 2019