awkward: `ak.sum` produces incorrect structure for outer dimension
Version of Awkward Array
56312ea2
This is seen in both v1 and v2 in master
Description and code to reproduce
Given this layout:
x = ak._v2.Array([
[
[
[1, 2, 3],
[4, 0, 0]
],
[
[5, 6, 7],
[8, 0, 0]
]
],
[
[
[9, 10, 11],
[12, 0, 0]
],
[
[13, 14, 15],
[16, 0, 0]
]
]
])
Calling ak.sum(x, axis=0)
does not match (in structure) the NumPy result:
>>> ak._v2.to_numpy(x).sum(axis=0).tolist()
[[[10, 12, 14], [16, 0, 0]], [[18, 20, 22], [24, 0, 0]]]
>>> ak._v2.sum(x, axis=0).tolist()
[[[10, 12, 14], []], [[16, 0, 0, 18, 20, 22], []]]
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (5 by maintainers)
I’m looking at this further. Your test case is interesting @ianna , because it seems like the truly jagged case is not fixed by ordering the parents, which is why I missed it with the regular test case.
@ianna Yes, I’m pretty sure that
findgaps
andnonlocal_outstartstops
are both assuming that the parents are ordered, because the<
operator doesn’t yield much meaningful information if this isn’t true. Maybe a better way of putting it is that the currently behaviour is broken for non-monotonic increasing parents, so it reinforces my belief that these kernels assume monotonic increasing parents.Please stop me if I am telling you things that you already know! This is just from my attempts to understand this logic:
The
gaps
is used to identify missing sublists in ordered parents (see my above comment). I haven’t looked into why we don’t just useparents
(compute gaps directly in the kernel(s) that need it), but it is used bynonlocal_outstartstops
to mark the break between sublists: https://github.com/scikit-hep/awkward-1.0/blob/500e0dd06fa4aa542ac0226da24851fb730e5042/src/cpu-kernels/awkward_ListOffsetArray_reduce_nonlocal_outstartsstops_64.cpp#L32-L36 This is always at least one (each entry ingaps
corresponds to a sublist), but can be more where we have gaps between parents (empty sublists)However, I made an assumption about
distincts
that I think may not be true, and would explain why the fix doesn’t work in some cases. Let me look.yep, the first 😦