astropy: Version 5.3 fails to read large CompImageHDUs

Description

Large FITS files with produced with the JSOC mtrack module (http://hmi.stanford.edu/rings/modules/mtrack.html) cannot be read with astropy 5.3.0–5.3.4.

The issue occurs only when the fits file contains a large CompImageHDU. In the example below, the data of size [641,1700,1700] are stored using the HCOMPRESS_1 compression method. But also other compression methods (e.g. RICE) share the same issue.

Expected behavior

There should be no coredump (segmentation fault) when accessing the data in the CompImageHDU.

How to Reproduce

  1. Download the sample file from my Dropbox (2.7 Gbyte): Vcomp.fits.
  2. Open the file, and try to access the CompImageHDU:
from astropy.io import fits

hdu = fits.open('Vcomp.fits')
shape = hdu[1].data.shape
  1. With astropy 5.3.x I get the following error:
zsh: segmentation fault  ipython

/opt/homebrew/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
  1. The error could be reproduced on MacOS 14, python version 3.11, and on 2 different linux workstations (x86_64 architecture) with python 3.9–3.12 (see example configurations in the version section below).
  2. There is no problem when reading in the file with astropy 5.1 or 5.2.
  3. There’s no proble for smaller files produced with the same mtrack tool (e.g. 50 frames instead of 641 frames).

Versions

macOS-14.0-arm64-arm-64bit Python 3.11.6 (main, Oct 2 2023, 13:45:54) [Clang 15.0.0 (clang-1500.0.40.1)] astropy 5.3.4

Linux-4.18.0-372.70.1.1.el8_6.x86_64-x86_64-with-glibc2.28 Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] astropy 5.3

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Reactions: 1
  • Comments: 28 (16 by maintainers)

Most upvoted comments

I received a reply from the CFITSIO people. It was Bill Pence himself, who took up on this! This is quite something - since he can be considered as the father of FITS. I attach his reply to my mail above:


Mail Oct 24 2023, from William Pence to A. Korpi-Lagg

Hello Andreas,

The FITS format issue you raise here (which is whether to interpret the pair of 32-bit integer values in a ‘P’ type variable-length array descriptor as signed or unsigned integers) has a long history. I’ll try to summarize the history in the next couple paragraphs and then offer a solution to your issue at the end.

The original 1995 proposal to support variable-length arrays in FITS files did not explicitly state whether the 32-bit integers are signed or unsigned. So when I added support for variable-length arrays in CFITSO, I chose to treat them as unsigned integers because that supports arrays that are twice as large as in the other case.

Jumping forward to 2005 when support for variable-length arrays was added to the official FITS Standard, after spirited debate, the wording was then clarify to state that the descriptor values should be treated as signed 32-bit integers. At the same time, a loop hole was added stating that the interpretation of negative descriptor values (which are bit wise equivalent to unsigned integer values greater than 2^31) is not defined. This was done so as to not cause any existing FITS files that had taken advantage of the previous ambiguity to become invalid. For better or worse, CFITSIO has continued take advantage of this loop hole ever since.

As you point out, there is another statement in a later section of the FITS standard, in the discussion of image compression techniques, that says:

“If the the heap is larger than 2.1 GB, then the ’1Q’ format (which uses 64-bit pointers) must be used [instead of the ‘1P’ format].”

However, this statement is clearly not correct because it is possible to create legal FITS files that have a heap larger than 2.1 GB (2^31 bytes) using the ‘P’ descriptor format if the sum of the two 32-bit signed integer descriptor values is greater than 2^31. It would also be legal to create a FITS file with a huge heap that contains space that is never used by the ‘P’ variable-length arrays. So, in my opinion at least, that statement in the FITS standard has no validity and should be disregarded. It would be good if the FITS format governing bodies could resolve this discrepancy in a future revision to the FITS standard.

Getting back to the current problem in astropy, my suggestion is to never use ‘P’ descriptors and just use 64-bit ‘Q’ descriptors all the time when writing new files. In today’s ‘big data’ environment it is commonplace for the data to be larger than 2.1 GB. Also, even if it is possible to create a FITS file using ‘P’ descriptors, future modifications to the file might cause it to increase in size and exceed the allowed range of the ‘P’ format. The overhead of switching from ‘P’ to ‘Q’ should be negligible in most cases. The astropy code could also be modified to use the same loop hole that CFITSIO uses and treat the ‘P’ descriptors as unsigned integers when reading FITS files.

Hope this helps,

William Pence NASA/HEASARC Emeritus


Maybe the issue can be settled by following the recommendation of Bill: " The astropy code could also be modified to use the same loop hole that CFITSIO uses and treat the ‘P’ descriptors as unsigned integers when reading FITS files. "

I wrote to the cfitsio helpdesk (see attached mail). Let’s see what happens. I’ll keep you posted… cfitsio.txt

More info is provided by the “Problem Report for Python” window that pops up on my macbook when the segmentation fault occurs. I attach the problem report: segfault.log

The terminal shows the following output:

zsh: segmentation fault  python segfault.py
(venv) lagg@lagan-tm22 .../FD_test/> /opt/homebrew/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Hope this helps. Unfortunately, I do not know how to use gdb to produce a traceback, sorry.

Flexible on input, strict on output would suggest to use unsigned int on input, but not write such files ourselves.

EDIT: of course, also nice not to regress relative to how astropy behaved when we used cfitsio under the hood.

I could test the nightly wheel and indeed the segfault is gone. Instead, it now raises a cfitsio exception:

_compression.CfitsioException: decompression warning: unused bytes at end of compressed buffer

which I can catch and prevent my code to exit unexpectedly.

However, the issue remains that CFITSIO produces fits files which cannot be read with astropy.io.fits. There might not be many of such files around in various data bases, but it’s likely that there are some, so it is necessary that also astropy must be able to read those files, even if they were produced with a cfitsio version violating the standard (or interpreting the ‘ambiguity’ in the standard in a certain way).

I think it is important to discuss this issue with the cfitsio people and to agree on a solution. If I can be of any help in this process, I’d be happy to do so.

Forgot to say thanks @alagg for contacting them, it helps understanding the origin of the issue 😃 I guess we could fix the offset, it’s just that it requires more work, and it’s annoying to loose time and have to deal with deviations from the Standard (especially when it comes from cfitsio itself).

Ok, so there is a problem with this file: the compressed data is stored in a vla column with TFORM='1PB(3982029)', but P means an int32 that cannot store correctly the offsets beyond some limit. When reaching plane 533 (in the first dimension) the offset reaches the maximum of an int32 and becomes negative… leading to undefined behavior, reading random bytes somewhere, causing the warnings and errors we saw above, or segfaults with .section (not sure why the behavior differs). Interestingly fitsio is happy when reading planes > 533, so maybe cfitsio detects and corrects the issue silently ? Which would explains why astropy<5.3 was happy too. Now the question is, what do we do ? Fix the column type (P → Q) but this means computing the heap size and detecting overflow etc., or raise an error if the offset is negative. @Cadair @astrofrog