astropy: astropy.io.fits Column memory problem

I’m having a crash every time I try to work with a fits table after having added something to it. This is especially problematic when trying to add a column and then save the resultant fits table to a new file.

Here’s a really simple code that gets the error:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Oct  3 10:15:54 2017

Trying to demonstrate an issue with fits columns
"""

from astropy.io import fits
import numpy as np
import os

fits_file_path = "/scr/depot0/csh4/all_wise_quasars.fits"

size = os.path.getsize(fits_file_path)
print("File size is " + str(size/1000000) + " megabytes")

f = fits.open(fits_file_path)
print("Getting length the first time")
length = len(f[1].data)
print("Length is " + str(length))

values_to_add = np.empty(length).fill(False)
column = fits.Column(name='false_vals', format="L", array=values_to_add)
f[1].columns.add_col(column)

print("Going to get length again")
length = len(f[1].data)
print("Length is " + str(length))

Which has the output

File size is 133.2288 megabytes
Getting length the first time
Length is 581728
Going to get length again
Traceback (most recent call last):

  File "<ipython-input-9-0a6cbaf424c2>", line 1, in <module>
    runfile('/scr/depot0/csh4/astropy_problem.py', wdir='/scr/depot0/csh4')

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 688, in runfile
    execfile(filename, namespace)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/scr/depot0/csh4/astropy_problem.py", line 31, in <module>
    length = len(f[1].data)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/utils/decorators.py", line 736, in __get__
    val = self.fget(obj)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/table.py", line 388, in data
    data = self._get_tbdata()

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/table.py", line 171, in _get_tbdata
    self._data_offset)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/base.py", line 480, in _get_raw_data
    return self._file.readarray(offset=offset, dtype=code, shape=shape)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/file.py", line 285, in readarray
    buffer=self._mmap)

TypeError: buffer is too small for requested array

This same error is thrown if I try to do anything with the table. However, I can still access f[1].header and f[1].columns just fine.

The fits catalog file I’m running on can be found at http://faraday.uwyo.edu/~admyers/wisemask2017/all_wise_quasars.fits

I’m running these versions: conda 4.3.25 astropy 2.0.1 python 3.6.1 numpy 1.13.1 spyder 3.2.1

My OS is Springdale Linux Release 6.9 (Pisa) GNOME 2.28.2 Kernel Linux 2.6.32-696.10.1.el6.x86_64 with 16 GB RAM

The memory usage in this code is not much, so I think these are the possible problems

  • I’m not adding to this file in the right way
  • there is a problem with this fits file (possible, but topcat has no problem with it)
  • there is a problem with the combination of versions of codes that I am using
  • astropy doesn’t run well on files at this size (~100 MB)
  • astropy has a memory issue here

It isn’t clear which of these is the problem, so I figured I’d submit the issue here.

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 16 (11 by maintainers)

Most upvoted comments

Yes, no need to keep this one. And you already gave the workaround, though it can be simplified a bit:

fits.BinTableHDU.from_columns(f[1].columns + column)

or, as add_col both modifies inplace and return a new Coldefs:

fits.BinTableHDU.from_columns(f[1].columns.add_col(column))

I guess we should remove add_col and del_col.

There is a workaround though. For now, you need to create it “from scratch” like this (I used pf instead of f out of habit):

# Just using pf[1].columns directly gives me an error, so need list comprehension
newcols = [c for c in pf[1].columns] + [column]
tnew = fits.BinTableHDU.from_columns(newcols)
pf[1] = tnew
pf.writeto('table_with_extra_col.fits')

Hope this helps.

Looks like a real bug 😱 It’s related to this : https://github.com/astropy/astropy/blob/9fc44c4aeb5740641c472526365bd8e2eb8b41b4/astropy/io/fits/hdu/table.py#L210 After adding the column the in-memory data is deleted and loaded back from the file but with a wrong dtype (which contains the new columns).

The error is due to

values_to_add = np.empty(length).fill(False)

np.fill() is an in-place operation. Therefore, values_to_add is not the array but just a None.

If you do this instead, it will work:

values_to_add = [False] * length

If this fixes your problem, please close the issue.