h5py: Memory Leak when Slicing Dataset

To whom it may concern,

I recently ran into an issue where when I tried to slice a HDF5 dataset (similar to how I would slice a numpy array), my RAM kept filling up until I had to kill the program.

Here are the commands I entered in:

dataset = h5py.File(dataset_directory + recording_name)
print(dataset['3BData/Raw'][0:1000:2])

I basically tried to slice out every other element in the dataset, but this function never completed and it filled my entire RAM. Here are the dataset details:

<HDF5 dataset "Raw": shape (3224391680,), type "<u2">

Here are the specifications I am using:

python -c ‘import h5py; print(h5py.version.info)’ h5py 2.8.0 HDF5 1.10.2 Python 3.7.1 (default, Oct 23 2018, 19:19:42) [GCC 7.3.0] sys.platform linux sys.maxsize 9223372036854775807 numpy 1.15.3

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (12 by maintainers)

Most upvoted comments

@epourmal Good morning! 😃 Here you go:

#include <stdio.h>
#include <stdlib.h>
#include "hdf5.h"

int main() {
   const hsize_t size = 100000000L;
   hid_t fapl, file, dataset, dcpl, memspace, dataspace;
   herr_t status;
   hsize_t start = 0;
   hsize_t stride = 2;
   hsize_t halfsize = size/2;
   float *data;

   printf("Creating data file...");
   file = H5Fcreate("dummy.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

   dataspace = H5Screate_simple(1, &size, NULL);
   dcpl = H5Pcreate(H5P_DATASET_CREATE);
   status = H5Pset_chunk(dcpl, 1, &size);
   status = H5Pset_deflate(dcpl, 3);

   dataset = H5Dcreate2(file, "data", H5T_INTEL_F32, dataspace,
   			H5P_DEFAULT, dcpl, H5P_DEFAULT);

   status = H5Sclose(dataspace);
   status = H5Pclose(dcpl);

   data = malloc(sizeof(float)*size);
   for(hsize_t i=0; i<size; i++)
     data[i] = (float)rand()/(float)(RAND_MAX/1.);
   memspace = H5Screate_simple(1, &size, NULL);
   dataspace = H5Dget_space(dataset);
   status = H5Dwrite(dataset, H5T_NATIVE_FLOAT, memspace, dataspace, H5P_DEFAULT, data);
   status = H5Sclose(dataspace);
   status = H5Sclose(memspace);
   free(data);

   status = H5Dclose(dataset);
   status = H5Fclose(file);
   printf(" done.\n");

   printf("Loading data...");
   file = H5Fopen("dummy.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
   dataset = H5Dopen2(file, "data", H5P_DEFAULT);

   data = malloc(sizeof(float)*halfsize);
   memspace = H5Screate_simple(1, &halfsize, NULL);
   dataspace = H5Dget_space(dataset);
   status = H5Sselect_hyperslab(dataspace, H5S_SELECT_SET, &start, &stride, &halfsize, NULL);
   status = H5Dread(dataset, H5T_NATIVE_FLOAT, memspace, dataspace, H5P_DEFAULT, data);
   status = H5Sclose(dataspace);
   status = H5Sclose(memspace);
   free(data);

   status = H5Dclose(dataset);
   status = H5Fclose(file);
   printf(" done.\n");
}