pandas: Pandas pytables interface doesn't create empty table datasets
Pandas used to allow the writing of empty HDF5 datasets through its pytables interface code. However, after upgrading to 0.17 (from 0.11), we’ve discovered that this behaviour is intentionally short circuited. The library behaves as though the dataset is being written, but simply ignores the request and the resulting HDF5 file doesn’t contain the requested table.
The offending code is in pandas/io/pytables.py:_write_to_group()
# we don't want to store a table node at all if are object is 0-len
# as there are not dtypes
if getattr(value, 'empty', None) and (format == 'table' or append):
return
We’ve worked around it by patching our installed copy of pandas, but we’d like to know the provocation behind this code before submitting a pull request. The comment implies that the lack of dtypes in the dataset is the cause, however each pandas column has type information even if empty.
Any clarification would be appreciated
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 5
- Comments: 27 (12 by maintainers)
We’re writing a data structure that can be empty. Then we’re reading the data structure in another program. The current method silently elides the existence of the table, so the reading program would have to catch an exception and fake the data structure.
I think I’m facing the same issue.
Here’s how to reproduce:
My use case
I’m writing an API to store timeseries and I would like to separate creation/deletion of timeseries ID and data write/delete in a timeseries.
In other words, I want to be able to do
but I don’t know how to create an empty timeseries because it won’t be written in the file. I could allow save to auto create timeseries, but this wouldn’t solve the issue of the ID not being listed until there actually is data in it, therefore not being advertised in the list.
The only workaround I see is to maintain an ID list somewhere else, which I’d rather avoid.