h5py: Memory leak when reading from an hdf5 dataset

OS

NAME=“Ubuntu” VERSION=“20.04.3 LTS (Focal Fossa)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 20.04.3 LTS” VERSION_ID=“20.04”

Summary of the h5py configuration

h5py 3.4.0 HDF5 1.12.1 Python 3.8.9 (default, Oct 11 2021, 09:08:45) [GCC 9.3.0] sys.platform linux sys.maxsize 9223372036854775807 numpy 1.21.2 cython (built with) 0.29.24 numpy (built against) 1.17.5 HDF5 (built against) 1.12.1

Code to reproduce the issue (memory is leaking, please track the currently used RAM):

import h5py 
import numpy as np 
 
with h5py.File("leak.h5py", "w") as f: 
    f["dataset"] = np.random.rand(1000, 1000) 

with h5py.File("leak.h5py", "r") as f: 
    while True: 
        f["dataset"][0]

Am I doing something wrong? Perhaps, reading the data?

Will appreciate any help!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I do see the memory usage rising with @FeryET’s example on Colab (which does appear to be using the 3.5 wheel), but it appears once you close the file, all the references get cleaned up, and the memory usage goes back to its starting value (which is a lot better than losing references which never get cleaned up). I’m going to try things locally, to see if there’s not some oddness with Colab. If I can reproduce this, I’ll create a new ticket so we can track this (and so depending on where the issue is, if we need to bump the packaged HDF5 or not).

The fix is released now as 3.5.

I was about to report I could not reproduce this, got distracted writing a reply by a phone call…and then I OOM’d my machine during the call when the test script exhausted my memory.

This seems to be a regression from 3.3 to 3.4.