cudf: [BUG] Length of Validity mask should be divisible by 8 bytes.

Found by @VibhuJawa in https://github.com/dmlc/xgboost/issues/4911 :

import cudf
import numpy as np

def minimal():
    X = cudf.DataFrame({'x': cudf.Series([0, 1, 2, None], dtype=np.int32)})
    print(X['x'].__cuda_array_interface__['mask'])


if __name__ == '__main__':
    minimal()

Here the mask object in the interface is of length 4 bits instead of 8 bytes required by https://arrow.apache.org/docs/format/Layout.html#null-bitmaps

cudf version: 0.9.0

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (11 by maintainers)

Most upvoted comments

always at least 256B aligned

Building off what @harrism said, bitmask allocations in libcudf are always 256B aligned, and the allocation is padded to a multiple of a 64B.