awkward: `ak.sum` produces incorrect structure for outer dimension

Version of Awkward Array

56312ea2

This is seen in both v1 and v2 in master

Description and code to reproduce

Given this layout:

x = ak._v2.Array([
    [
        [
            [1, 2, 3],
            [4, 0, 0]
        ],
        [
            [5, 6, 7],
            [8, 0, 0]
        ]
    ],
    [
        [
            [9, 10, 11],
            [12, 0, 0]
        ],
        [
            [13, 14, 15],
            [16, 0, 0]
        ]
        
    ]
])

Calling ak.sum(x, axis=0) does not match (in structure) the NumPy result:

>>> ak._v2.to_numpy(x).sum(axis=0).tolist()
[[[10, 12, 14], [16, 0, 0]], [[18, 20, 22], [24, 0, 0]]]
>>> ak._v2.sum(x, axis=0).tolist()
[[[10, 12, 14], []], [[16, 0, 0, 18, 20, 22], []]]

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (5 by maintainers)

Commits related to this issue

Most upvoted comments

I’m looking at this further. Your test case is interesting @ianna , because it seems like the truly jagged case is not fixed by ordering the parents, which is why I missed it with the regular test case.

@ianna Yes, I’m pretty sure that findgaps and nonlocal_outstartstops are both assuming that the parents are ordered, because the < operator doesn’t yield much meaningful information if this isn’t true. Maybe a better way of putting it is that the currently behaviour is broken for non-monotonic increasing parents, so it reinforces my belief that these kernels assume monotonic increasing parents.

Please stop me if I am telling you things that you already know! This is just from my attempts to understand this logic:

The gaps is used to identify missing sublists in ordered parents (see my above comment). I haven’t looked into why we don’t just use parents (compute gaps directly in the kernel(s) that need it), but it is used by nonlocal_outstartstops to mark the break between sublists: https://github.com/scikit-hep/awkward-1.0/blob/500e0dd06fa4aa542ac0226da24851fb730e5042/src/cpu-kernels/awkward_ListOffsetArray_reduce_nonlocal_outstartsstops_64.cpp#L32-L36 This is always at least one (each entry in gaps corresponds to a sublist), but can be more where we have gaps between parents (empty sublists)

However, I made an assumption about distincts that I think may not be true, and would explain why the fix doesn’t work in some cases. Let me look.

@ianna did this segfault or raise an Exception? I see only a ValueError in master & 1.7.0 on my machine.

yep, the first 😦