astropy: astropy.io.fits Column memory problem
I’m having a crash every time I try to work with a fits table after having added something to it. This is especially problematic when trying to add a column and then save the resultant fits table to a new file.
Here’s a really simple code that gets the error:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Oct 3 10:15:54 2017
Trying to demonstrate an issue with fits columns
"""
from astropy.io import fits
import numpy as np
import os
fits_file_path = "/scr/depot0/csh4/all_wise_quasars.fits"
size = os.path.getsize(fits_file_path)
print("File size is " + str(size/1000000) + " megabytes")
f = fits.open(fits_file_path)
print("Getting length the first time")
length = len(f[1].data)
print("Length is " + str(length))
values_to_add = np.empty(length).fill(False)
column = fits.Column(name='false_vals', format="L", array=values_to_add)
f[1].columns.add_col(column)
print("Going to get length again")
length = len(f[1].data)
print("Length is " + str(length))
Which has the output
File size is 133.2288 megabytes
Getting length the first time
Length is 581728
Going to get length again
Traceback (most recent call last):
File "<ipython-input-9-0a6cbaf424c2>", line 1, in <module>
runfile('/scr/depot0/csh4/astropy_problem.py', wdir='/scr/depot0/csh4')
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 688, in runfile
execfile(filename, namespace)
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/scr/depot0/csh4/astropy_problem.py", line 31, in <module>
length = len(f[1].data)
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/utils/decorators.py", line 736, in __get__
val = self.fget(obj)
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/table.py", line 388, in data
data = self._get_tbdata()
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/table.py", line 171, in _get_tbdata
self._data_offset)
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/base.py", line 480, in _get_raw_data
return self._file.readarray(offset=offset, dtype=code, shape=shape)
File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/file.py", line 285, in readarray
buffer=self._mmap)
TypeError: buffer is too small for requested array
This same error is thrown if I try to do anything with the table. However, I can still access f[1].header
and f[1].columns
just fine.
The fits catalog file I’m running on can be found at http://faraday.uwyo.edu/~admyers/wisemask2017/all_wise_quasars.fits
I’m running these versions: conda 4.3.25 astropy 2.0.1 python 3.6.1 numpy 1.13.1 spyder 3.2.1
My OS is Springdale Linux Release 6.9 (Pisa) GNOME 2.28.2 Kernel Linux 2.6.32-696.10.1.el6.x86_64 with 16 GB RAM
The memory usage in this code is not much, so I think these are the possible problems
- I’m not adding to this file in the right way
- there is a problem with this fits file (possible, but topcat has no problem with it)
- there is a problem with the combination of versions of codes that I am using
- astropy doesn’t run well on files at this size (~100 MB)
- astropy has a memory issue here
It isn’t clear which of these is the problem, so I figured I’d submit the issue here.
Thanks!
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 16 (11 by maintainers)
Yes, no need to keep this one. And you already gave the workaround, though it can be simplified a bit:
or, as
add_col
both modifies inplace and return a new Coldefs:I guess we should remove
add_col
anddel_col
.There is a workaround though. For now, you need to create it “from scratch” like this (I used
pf
instead off
out of habit):Hope this helps.
Looks like a real bug 😱 It’s related to this : https://github.com/astropy/astropy/blob/9fc44c4aeb5740641c472526365bd8e2eb8b41b4/astropy/io/fits/hdu/table.py#L210 After adding the column the in-memory data is deleted and loaded back from the file but with a wrong dtype (which contains the new columns).
The error is due to
np.fill() is an in-place operation. Therefore,
values_to_add
is not the array but just aNone
.If you do this instead, it will work:
If this fixes your problem, please close the issue.