astropy: Possible precision loss bug when reading FITS files with astropy.io

I and my colleague @wafels encountered a possible bug when attempting to read modified julian day (MJD) values out of a FITS file using astropy.io.fits. A working example is below. For reference, the file in question can be found here. It contains a catalogue of triggers marked ‘SFLARE’ detected by the Fermi mission.

In [2]: from astropy.io import fits
In [3]: from astropy.time import Time
In [4]: result = fits.open('browse_results.fits’)
In [5]: hdu1 = result[1]
In [6]: hdu1.data['TRIGGER_TIME'][108]
Out[6]: 55719.266
In [7]: tt = Time(hdu1.data['TRIGGER_TIME'][108],format = 'mjd')

In [8]: tt.datetime
Out[8]: datetime.datetime(2011, 6, 7, 6, 22, 30)

For reference, the correct time (verified from manually viewing the file, and also from the original Fermi catalogue online) for this trigger is actually: 2011-06-07 06:23:06:652

As a further test, the correct trigger time is read out by MRDFITS in IDL. As best I can tell, there seems to be some sort of precision loss when reading this file with astropy.io.fits. This can be seen here:

In [9]: print repr(np.float64(hdu1.data['TRIGGER_TIME'][108]))
55719.265625

Whereas from IDL:

IDL> result = mrdfits('browse_results.fits',1)
IDL> print,result[108].trigger_time,format='(d20.13)' 
 55719.2660492089999

Putting these numbers into a time conversion, e.g. xTIME gives you the same difference in times, 55719.265625 --> 06:22:30 UT, and 55719.26604921 --> 06:23:06.652 UT

Although this example is specific to times in MJD format, it could be a more general problem - I have not tested this behaviour on other data types. Another possibility is that something is slightly non-standard regarding how the FITS file is formatted, but I am not sure what would cause this.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 59 (58 by maintainers)

Most upvoted comments

Ops, @saimn … Didn’t see your comment above. 😅

I think it is because TFORM6 = F15.0. I can’t find any entry for F in http://docs.astropy.org/en/stable/io/fits/usage/table.html#creating-a-fits-table . When I change that to TFORM6 = D15.0, that column is read in as float64 (I used tab = table.read(filename, format='fits')).

>>> tab[108]['TRIGGER_TIME']  # IDL is 55719.2660492089999
55719.266049209
>>> tt = Time(tab[108]['TRIGGER_TIME'], format='mjd')
>>> tt.datetime.strftime('%Y-%m-%d %H:%M:%S.%f')  # Should be 2011-06-07 06:23:06:652
'2011-06-07 06:23:06.651658'

I haven’t read this entire thread but I think this might be the same issue I pointed out a few days ago here: https://groups.google.com/d/msg/astropy-dev/0Cc7IrBqUaA/G46aro2zAgAJ

Exactly, that’s why using a TFORM of D10.2 might make sense for those.

That example was meant as an argument against making D behave like E (At least against the “simple” change mentioned in https://github.com/astropy/astropy/issues/5353#issuecomment-248911462)

Or the person who generates the file should just use “D” to begin with to avoid all this mess.

Slowly backs out towards the exit…

@MSeifert04 - Indeed, in both cases some float values are saved to a FITS file, with a E15.7 format, which is then read as float64 with the change instead of float32 before. This is consistent with what we should expect from the _convert_ascii_format function, so either the data should not be written with E15.7, or the test values must be updated.

In my previous https://github.com/astropy/astropy/issues/5353#issuecomment-248751079 I should correct my last paragraph:

I think a bug is a bug regardless of whether it was unintentional (some silly mistake in the code due to which code does not work as intended) or due to misunderstanding (=incorrect interpretation) of the standard~~.~~, unless developers of astropy.io.fits have never intended for this package to adhere to FITS standards.

Quoting from http://www.aanda.org/articles/aa/pdf/2010/16/aa15362-10.pdf (end of section 7.2.5) (see also https://github.com/astropy/astropy/issues/5353#issuecomment-248731885)

There is no difference between the F, D ,and E formats; the content of the string determines its interpretation. Numbers requiring more precision and / or range than the local computer can support may be represented. It is good form to specify a D format in TFORMn for a column of an ASCII table when that column will contain numbers that cannot be accurately represented in 32-bit IEEE binary format (see Appendix E ).

I don’t know what this means in practice (interpret everything as float64?) but given that the number cannot be accuratly represented in 32bit they violated the “good form”. 😄

Ah, okay… I found it… http://docs.astropy.org/en/stable/io/fits/usage/unfamiliar.html#ascii-tables

As documented in Astropy, it clearly states:

Fw.d       Single precision real

So, according to the doc anyway, it is “working as intended”. Why it is this way, I can’t explain. I didn’t write it.

The format keywords listed in the doc ( @pllim 's link) are for binary tables. For ascii tables, F15.0 seems valid, see Table 15 in http://www.aanda.org/articles/aa/pdf/2010/16/aa15362-10.pdf Also a bit later (page 20):

There is no difference between the F, D, and E formats; the content of the string determines its interpretation. Numbers requiring more precision and/or range than the local computer can support may be rep- resented. It is good form to specify a D format in TFORMn for a column of an ASCII table when that column will contain num- bers that cannot be accurately represented in 32-bit IEEE binary format (see Appendix E).