h5py: Accessing Dataset Across Chunks Slower than Without Chunking

System Info:

  • Operating System: Windows 7
  • Python version: 2.7.12
  • Anaconda: 4.1.1 64 bit
  • h5py version: 2.6.0
  • HDF5 version 1.8.15

I have a problem dealing with very large dataset in the same frames X rows X cols. I am chunking the dataset in chunks of (n_frames, 64, 64). This results in significant read time improvement when reading a single chunk, But when reading multiple chunks (n_frames, m64, m64) it is dramatically slower than without chunking. This seems like a bug to me.

For example in the following, reading the chunked data is slower than the non-chunked data:

import h5py
import numpy as np
import time

data = np.array(500, 512, 512)

with h5py.File('datacube_chunked.h5', 'w') as fid:
    fid.create_dataset('cube', data=data, chunks=(500, 64, 64))

with h5py.File('datacube.h5', 'w') as fid:
    fid.create_dataset('cube', data=data)

start_time = time.time()
with h5py.File('datacube.h5', 'r') as fid:
    a = fid["cube"][:,0:256,0:256]
print(time.time() - start_time)

start_time = time.time()
with h5py.File('datacube_chunked.h5', 'r') as fid:
    b = fid["cube"][:,0:256,0:256]
print(time.time() - start_time)

If I manually read the (500, 64, 64) sections into b, then reading the chunked data is faster, but I would have thought that h5py does this under the hood already.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

Thanks, now that you’ve reproduced it in C, I think you should definitely ask HDF group about this. If doing the right thing in C code is slower than doing it with a Python for loop, something is definitely wrong.

@takluyver Thanks for your suggestions. I adapted your benchmark script (https://gist.github.com/takluyver/0480a74881d84678f48b92c021129cd6) but it still shows the same time difference.

Also Selection type doesn’t matter, both SimpleSelection and FancySelection show similar performance, e.g. the following code is slow too:

t = perf_counter()
with h5py.File("test_chunked.h5", "r") as h5f:
    dset = h5f["data"][:3:2]  # switch from FancySelection to SimpleSelection
print("chunked, unordered 2\t", perf_counter() - t)

To test the C API, I modified an example from https://support.hdfgroup.org/ftp/HDF5/examples/src-html/c.html (h5_chunk_read.c) for the files created from the python script above. Here is the code, hope I didn’t miss anything.

#include "hdf5.h"

#define H5FILE_NAME "test_chunked.h5"
#define DATASETNAME "data"

#define STRIDE       2
#define NSLICES      2

int main (void) {
    hid_t       file;                        // handles
    hid_t       dataset;
    hid_t       filespace;
    hid_t       memspace;
    hid_t       cparms;

    hsize_t     dims[3];                     // dataset and chunk dimensions
    hsize_t     chunk_dims[3];
    hsize_t     count[3];
    hsize_t     stride[3];
    hsize_t     offset[3];

    herr_t      status, status_n;

    uint16_t    chunk_out[NSLICES][1000][1000];   // buffer for chunk to be read
    int         rank, rank_chunk;

    // Open the file and the dataset.
    file = H5Fopen(H5FILE_NAME, H5F_ACC_RDONLY, H5P_DEFAULT);
    dataset = H5Dopen2(file, DATASETNAME, H5P_DEFAULT);

    // Get dataset rank and dimension.
    filespace = H5Dget_space(dataset);    // Get filespace handle first.
    rank      = H5Sget_simple_extent_ndims(filespace);
    status_n  = H5Sget_simple_extent_dims(filespace, dims, NULL);

    // Get creation properties list.
    cparms = H5Dget_create_plist(dataset); // Get properties handle first.

    if (H5D_CHUNKED == H5Pget_layout(cparms))  {
	    // Get chunking information: rank and dimensions
	    rank_chunk = H5Pget_chunk(cparms, 3, chunk_dims);

        count[0] = NSLICES;
        count[1] = chunk_dims[1];
        count[2] = chunk_dims[2];

        // Define the memory space to read a chunk.
        memspace = H5Screate_simple(rank_chunk, count, NULL);

        // Define chunk in the file (hyperslab) to read.
        offset[0] = 0;
        offset[1] = 0;
        offset[2] = 0;

        stride[0] = STRIDE;
        stride[1] = 1;
        stride[2] = 1;

        status = H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, stride, count, NULL);

        // Read chunk back and display.
        status = H5Dread(dataset, H5T_NATIVE_UINT16, memspace, filespace, H5P_DEFAULT, chunk_out);
        for (int i = 0; i < NSLICES; i++) {
            printf("%u ", chunk_out[i][0][0]);
        }
        printf("\n");

        // Close/release resources.
        H5Sclose(memspace);
    }

    // Close/release resources.
    H5Pclose(cparms);
    H5Dclose(dataset);
    H5Sclose(filespace);
    H5Fclose(file);

    return 0;
}

Modifying STRIDE and/or NSLICES, compiling it with gcc/8.3.0 and hdf5/1.10.5, and timing single runs, I surprisingly got the following typical numbers:

STRIDE = 1, NSLICES = 2
real    0m0.015s
user    0m0.006s
sys     0m0.009s

STRIDE = 2, NSLICES = 2
real    0m0.378s
user    0m0.371s
sys     0m0.006s

The results correlate to those from h5py, so seems like it’s an issue of hdf5.