pandas: HDF file compression not working

xref https://github.com/pandas-dev/pandas/pull/28890#pullrequestreview-304361199

While updating the performance comparison part of the IO docs it was found that compressed size values for .hdf file formats were the same as uncompressed .hdf file formats.

    24009288 Oct 10 06:43 test_fixed.hdf
    24009288 Oct 10 06:43 test_fixed_compress.hdf
    24458940 Oct 10 06:44 test_table.hdf
    24458940 Oct 10 06:44 test_table_compress.hdf

This seems to be caused by the next lines saving the same files:

df.to_hdf('test.hdf', 'test', mode='w')
df.to_hdf('test.hdf', 'test', mode='w', complib='blosc')

We need to see why the complib parameter is being ignored, and fix it so the hdf5 file is saved compressed when used.

CC @datapythonista

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 4
  • Comments: 20 (19 by maintainers)

Most upvoted comments

@quangngd if it’s your first contribution, I’d recommend to start working on one of the files of #32550 instead. And once you’ve got that come back to this. I think it’ll make your life easier to start with something trivial. But totally up to you.

Thank you for the suggestion. I will definitely do that. I will come back if @sathyz don’t want to work on this.