awkward: MacOS tests randomly segfault
Version of Awkward Array
HEAD
Description and code to reproduce
For a while now (weeks?), the MacOS tests/**/*.py
have been failing with low probability—there’s about a 1 in 5 chance that one of the tests (based on Python 3.6, 3.7, 3.8, 3.9, and 3.10) raises a segfault. Here is an example. When that happens, the error code may be one of several signals, such as 6, 10, or 11. It often fails while running v2 tests, but that might just be because the v2 tests are after than the v1 tests. The fact that it hasn’t been failing consistently in the same test suggests that the signal is invoked during a garbage collection, but the actual error is earlier.
If this really is due to a v1 segfault, it’s rare enough that we could even ignore it—there won’t be many more v1 versions released. If it’s due to something in v2, then we really must take care of it. (And if it’s narrowed down to an easy v1 issue, then we’d also want to fix it in that case.)
If this had been a Linux segfault, I would run the tests locally many times in order to reproduce it (e.g. run it 100 times to virtually guarantee the P ≈ 1/25 case, then I would bisect the tests to find at least one that raises it. It might be necessary to insert explicit gc.collect()
calls to trigger it in absence of many other tests. However, the bug only manifests itself on MacOS, and I don’t have access to a Mac to do that intensive testing on. (Doing it through CI would be awful because it would add the compilation time to each run.)
Does anyone (@ianna, @ioanaif, @swishdiff, @agoose77, …) have a Mac and is willing to try that bisection search? Maybe my first question should be if any Mac users have encountered this in their local tests (assuming you’ve been running enough tests recently).
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (8 by maintainers)
@jpivarski - as suggested I put one test in a loop:
test_IndexedArray
fromoccasional failures confirm that the array data gets corrupted: an Index of an IndexedOptionArray in this case:
Here is a short test that can reproduce it:
v2 arrays are also immutable. Once constructed, we should never be overwriting any of their buffers, and the Python objects are replaced, rather than modified in place.
The fact that the segfault is in v2 is in another way good news: we can eliminate a variety of ways that v1 can produce errors. v2’s NumPy handling introduces significantly fewer avenues for segfaults than v1’s pointer handling. But there are a few that remain:
from_iter
(perhaps indirectly through the highlevelArray
constructor) are supposed to be allocated and owned by NumPy (in pybind11 C++ code, calling NumPy to allocate and own the buffers here, exposed to Python here, and assembled into a v2 array here). The fact that we copy from unique pointers now makes this a lot safer than it used to be.from_numpy
(perhaps indirectly through the highlevelArray
constructor) are definitely allocated and owned by NumPy; there’s no C++ involved.sys.getrefcount
always returns at least2
) and we also follow a convention of only passing arguments in a kernel call withsomeindex.to(nplike)
, so thesomeindex
name stays in scope beyond the lifetime of the kernel call, anyway. (We never construct objects in the argument list. But like I said, I think Python has friendly semantics for that, anyway.)In v1, there are other ways to get segfaults, but I’m pretty sure the above is a complete list for v2. If you can narrow down which tests cause it, we have a short list of possible causes.
@ianna I think you said before that “merge many” is where it failed before, when it failed. I had been thinking it was in a garbage collection pass and not actually related to the mergemany method or this test. But if it’s always failing (when it fails) in
test_0449-merge-many-arrays-in-one-pass.py
even though it enters that function with all garbage already collected, that’s suggesting new information: that it really is something intest_0449-merge-many-arrays-in-one-pass.py
that’s the problem.If you run just that one test file,
test_0449-merge-many-arrays-in-one-pass.py
obscenely many times (10× or 100× your current runs, because it’s just one file), does it ever segfault? If so, it’s bracketed and we can narrow in.