awkward: ak.Array wrongly deduces input type

Version of Awkward Array

1.10.1

Description and code to reproduce

a = np.array([1, 2, 3], dtype=np.int32)
ak.Array(a)  # <Array [1, 2, 3] type='3 * int32'> OK
ak.Array([a])   # <Array [[1, 2, 3]] type='1 * var * int64'> BAD

As you can see, in the varlength context, Array deduces int64 although the underlying type is int32. It is important that Array deduces the exact type, since the array in 64 bit wastes memory and CPU cycles.

This is also important when these arrays are written to ROOT files. Using types that are too large wastes disk space.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

When designing APIs, one needs to think about all ways in which this API can be used and then handle all those cases. Not only a few.

I’m not sure that I’d call this a workaround. ak.ArrayBuilder needs to be able to handle all kinds of user input; even if we added support for all of the NumPy primitive types, ArrayBuilder still performs multiple copies of intermediate buffers (it allocates in panels) and has to visit all of the array elements. So, I may be corrected here, but I think it’s fairly certain that this will remain the advised solution in the case that you have existing NumPy arrays.

ArrayBuilder has only Int64Builder. We could extend the API to support int32.