pandas: HDF file compression not working
xref https://github.com/pandas-dev/pandas/pull/28890#pullrequestreview-304361199
While updating the performance comparison part of the IO docs it was found that compressed size values for .hdf file formats were the same as uncompressed .hdf file formats.
24009288 Oct 10 06:43 test_fixed.hdf
24009288 Oct 10 06:43 test_fixed_compress.hdf
24458940 Oct 10 06:44 test_table.hdf
24458940 Oct 10 06:44 test_table_compress.hdf
This seems to be caused by the next lines saving the same files:
df.to_hdf('test.hdf', 'test', mode='w')
df.to_hdf('test.hdf', 'test', mode='w', complib='blosc')
We need to see why the complib
parameter is being ignored, and fix it so the hdf5
file is saved compressed when used.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 4
- Comments: 20 (19 by maintainers)
Thank you for the suggestion. I will definitely do that. I will come back if @sathyz don’t want to work on this.