astropy: Occasional failures in vo_test.py

The Debian CI tests (which are run whenever one of the dependencies of astropy changed) sometimes show a segmentation fault on io/votable/tests/vo_test.py. I had this now the second time at the same place, so it was probably not just a glitch in our environment: here and here. The log excerpt there is

../../usr/lib/python2.7/dist-packages/astropy/io/votable/tests/vo_test.py ...........................................................................................................................................................................
bash: line 1:  1076 Segmentation fault      (core dumped) 

The relevant changes that lead to the CI test were new debian release of gfortran-5.3.1 (from debian relaease 11 to 12 and 12 to 13) This may be similar to #4240 or #4352, but I am not sure. Apart from the two failures, I have however ~20 succeeding CI test runs. Platform is amd64.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 29 (27 by maintainers)

Most upvoted comments

Thanks for the triage, @kakirastern, and thanks for checking those many logs, @olebole. I agree that if there was no issue for that many, we can just close this for now. Hopefully, we did really solve it!

Looking int our Debian CI logs (~300 tests), I couldn’t find that segfault in the last year for the Python 3 package. So, I would guess this is somehow fixed and would propose to close it. If it comes back, we can still reopen.

I have tracked the segfault down to: https://github.com/astropy/astropy/blob/master/astropy/io/votable/src/tablewriter.c#L74 In tablewriter.c the first call to stdlib realloc() segfaults.

Anyway, this is a recurrence of #2100. This may be related to the specific win64 anaconda python 3.6, but that is compiled with the same MSVC as I’m using on astropy. I’ll see if I can test this with the official CPython binaries.

IMHO, even if it is specific to anconda python 3.6, this should be fixed: This calls realloc on a Py_Object, which is allocated by Py_MemAlloc. It should not be realloced by the stdlib realloc but through the python memory manager. (Changing the realloc to PyMem_Realloc fixes my failures)

I see. Then let’s keep this issue stay open for now… Meanwhile, I will help keep an eye on the bug should it crop up during CI tests again in the near future. I understand the gravitas of a segfault. And the bug appears to affect the io/votable/tests/vo_test.py only.

I just encountered one in CircleCI:

____________ lib.linux-i686-3.5/astropy/io/votable/tests/vo_test.py ____________
[gw2] linux -- Python 3.5.2 /usr/bin/python3
Slave 'gw2' crashed while running 'lib.linux-i686-3.5/astropy/io/votable/tests/vo_test.py::TestThroughBinary2::()::test_string_test'

Commenting out the suggest lines, does indeed cause a crash in a different test (which also calls tablewriter.c.

I’m going to retry on python 3.5 first, to see if it is indeed only related to python 3.6. EDIT: Without recompiling installed in python 3.5: No test failures/segfaults.