astropy: No longer able to read BinnedTimeSeries with datetime column saved as ECSV after upgrading from 4.2.1 -> 5.0+
Description
Hi, This commit merged in PR #11569 breaks my ability to read an ECSV file created using Astropy v 4.2.1, BinnedTimeSeries class’s write method, which has a datetime64 column. Downgrading astropy back to 4.2.1 fixes the issue because the strict type checking in line 177 of ecsv.py is not there.
Is there a reason why this strict type checking was added to ECSV? Is there a way to preserve reading and writing of ECSV files created with BinnedTimeSeries across versions? I am happy to make a PR on this if the strict type checking is allowed to be scaled back or we can add datetime64 as an allowed type.
Expected behavior
The file is read into a BinnedTimeSeries
object from ecsv file without error.
Actual behavior
ValueError is produced and the file is not read because ECSV.py does not accept the datetime64 column.
ValueError: datatype 'datetime64' of column 'time_bin_start' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string')
Steps to Reproduce
The file is read using:
BinnedTimeSeries.read('<file_path>', format='ascii.ecsv')
which gives a long error.
The file in question is a binned time series created by astropy.timeseries.aggregate_downsample
. which itself is a binned version of an astropy.timeseries.TimeSeries
instance with some TESS data. (loaded via TimeSeries.from_pandas(Tess.set_index(‘datetime’)). I.e., it has a datetime64 index. The file was written using the classes own .write method in Astropy V4.2.1 from an instance of said class:
myBinnedTimeSeries.write('<file_path>',format='ascii.ecsv',overwrite=True)
I’ll attach a concatenated version of the file (as it contains private data). However, the relevant part from the header is on line 4:
# %ECSV 0.9
# ---
# datatype:
# - {name: time_bin_start, datatype: datetime64}
as you can see, the datatype is datetime64. This works fine with ECSV V0.9 but not V1.0 as some sort of strict type checking was added.
Full error log:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [3], in <module>
---> 49 tsrbin = BinnedTimeSeries.read('../Photometry/tsr_bin.dat', format='ascii.ecsv')
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/timeseries/binned.py:285, in BinnedTimeSeries.read(self, filename, time_bin_start_column, time_bin_end_column, time_bin_size_column, time_bin_size_unit, time_format, time_scale, format, *args, **kwargs)
230 """
231 Read and parse a file and returns a `astropy.timeseries.BinnedTimeSeries`.
232
(...)
279
280 """
282 try:
283
284 # First we try the readers defined for the BinnedTimeSeries class
--> 285 return super().read(filename, format=format, *args, **kwargs)
287 except TypeError:
288
289 # Otherwise we fall back to the default Table readers
291 if time_bin_start_column is None:
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/table/connect.py:62, in TableRead.__call__(self, *args, **kwargs)
59 units = kwargs.pop('units', None)
60 descriptions = kwargs.pop('descriptions', None)
---> 62 out = self.registry.read(cls, *args, **kwargs)
64 # For some readers (e.g., ascii.ecsv), the returned `out` class is not
65 # guaranteed to be the same as the desired output `cls`. If so,
66 # try coercing to desired class without copying (io.registry.read
67 # would normally do a copy). The normal case here is swapping
68 # Table <=> QTable.
69 if cls is not out.__class__:
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/io/registry/core.py:199, in UnifiedInputRegistry.read(self, cls, format, cache, *args, **kwargs)
195 format = self._get_valid_format(
196 'read', cls, path, fileobj, args, kwargs)
198 reader = self.get_reader(format, cls)
--> 199 data = reader(*args, **kwargs)
201 if not isinstance(data, cls):
202 # User has read with a subclass where only the parent class is
203 # registered. This returns the parent class, so try coercing
204 # to desired subclass.
205 try:
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/io/ascii/connect.py:18, in io_read(format, filename, **kwargs)
16 format = re.sub(r'^ascii\.', '', format)
17 kwargs['format'] = format
---> 18 return read(filename, **kwargs)
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/io/ascii/ui.py:376, in read(table, guess, **kwargs)
374 else:
375 reader = get_reader(**new_kwargs)
--> 376 dat = reader.read(table)
377 _read_trace.append({'kwargs': copy.deepcopy(new_kwargs),
378 'Reader': reader.__class__,
379 'status': 'Success with specified Reader class '
380 '(no guessing)'})
382 # Static analysis (pyright) indicates `dat` might be left undefined, so just
383 # to be sure define it at the beginning and check here.
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/io/ascii/core.py:1343, in BaseReader.read(self, table)
1340 self.header.update_meta(self.lines, self.meta)
1342 # Get the table column definitions
-> 1343 self.header.get_cols(self.lines)
1345 # Make sure columns are valid
1346 self.header.check_column_names(self.names, self.strict_names, self.guessing)
File ~/Apps/miniconda3/envs/py310_latest/lib/python3.10/site-packages/astropy/io/ascii/ecsv.py:177, in EcsvHeader.get_cols(self, lines)
175 col.dtype = header_cols[col.name]['datatype']
176 if col.dtype not in ECSV_DATATYPES:
--> 177 raise ValueError(f'datatype {col.dtype!r} of column {col.name!r} '
178 f'is not in allowed values {ECSV_DATATYPES}')
180 # Subtype is written like "int64[2,null]" and we want to split this
181 # out to "int64" and [2, None].
182 subtype = col.subtype
ValueError: datatype 'datetime64' of column 'time_bin_start' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string')
System Details
(For the version that does not work) Python 3.10.2 | packaged by conda-forge | (main, Feb 1 2022, 19:28:35) [GCC 9.4.0] Numpy 1.22.2 pyerfa 2.0.0.1 astropy 5.0.1 Scipy 1.8.0 Matplotlib 3.5.1
(For the version that does work) Python 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] Numpy 1.20.3 pyerfa 2.0.0.1 astropy 4.2.1 Scipy 1.7.0 Matplotlib 3.4.2
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (20 by maintainers)
I would be highly supportive of a backwards compatibility bugfix for V0.9, and then an API change for V5.1 that changes the spec. I would be willing to work on a PR for it.
@weaverba137 @emirkmo - sorry that the updates in ECSV reading are breaking back-compatibility, I am definitely sensitive to that. Perhaps we can do a bug-fix release which checks for ECSV 0.9 (as opposed to 1.0) and silently reads them without warnings. This will work for files written with older astropy.
@weaverba137 -
can you provide an example file with an[EDIT - I saw the example and read the discussion in the linked issue]. Going forward (astropy >= 5.0),object
column?object
columns are written (and read) as described at https://github.com/astropy/astropy-APEs/blob/main/APE6.rst#object-columns. This is limited to object types that can be serialized to standard JSON (without any custom representations).Our group has recently encountered errors very closely related to this. In our case the ECSV 0.9 type is
object
. I think the ECSV 1.0 equivalent isstring subtype: json
, but I haven’t been able to to confirm that yet.In general, what is the policy on backward-compatibility when reading ECSV files?