h5py: Dataset slice reference - AttributeError: module 'h5py' has no attribute 'ref_dtype' - documentation outdated?

To assist reproducing bugs, please include the following:

  • Operating System: Ubuntu 18.04
  • Python version: 3.7.3
  • Where Python was acquired: Anaconda, conda install h5py (conda-forge)
  • h5py version: 2.9.0
  • HDF5 version: 1.10.4
  • The full traceback/stack trace shown (if it appears)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-3e6d0b14c796> in <module>
      9 
     10     # ref slice of both dataset
---> 11     ds_ref = h5_store.create_dataset("ref", (100,), dtype=h5py.ref_dtype)
     12     ds_ref[:50] = ds1[:50]

AttributeError: module 'h5py' has no attribute 'ref_dtype'

Just started exploring linking 2 dataset slices through references, therefore I’m following this page: http://docs.h5py.org/en/stable/refs.html

To get the above error, I ran the following code (in a JupyterHub Notebook):

import h5py

with h5py.File("sliceref.h5", 'w') as h5_store:
    # create 2 datasets
    ds1 = h5_store.create_dataset('wav1', (100,))
    ds1[...] = np.arange(100)
    print(ds1[:])
    ds2 = h5_store.create_dataset('wav2', (100,))
    ds2[...] = np.arange(100, 200, 1)
    print(ds2[:])
    
    # ref slice of both dataset
    ds_ref = h5_store.create_dataset("ref", (100,), dtype=h5py.ref_dtype)
    ds_ref[:50] = ds1.regionref[:50]

Seeing print xxx statements in the documentation makes me assume that it was written for Python 2.7 and has just not been updated?


What would be the correct way of doing (combining the first half of 2 arrays through referencing, no data duplication)

ds_ref[:50] = ds1.regionref[:50]
ds_ref[50:] = ds2.regionref[:50]

?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

h5py.VirtualLayout … seems it’s not like something you can store in the .h5 itself?

Yes, you can store it in the file. You assemble the VirtualLayout, then pass it to f.create_virtual_dataset() to add it to the file. Have a look at the documentation about virtual datasets: http://docs.h5py.org/en/stable/vds.html

Am I correct in thinking that a reference is just like e.g. [5:2] (for a 1-D array) index values? So if you want to store a complete reference, you need to store both a Reference to the correct dataset and a RegionReference to select the right region in that dataset?

I had to look this up (I haven’t used references before). A region reference is both a reference to the dataset and to a selection from that dataset, so you only need to store one thing.

To use it with h5py, you need to use it in two lookups: once to get the dataset, once to get the data from it:

f[regionref][regionref]

TypeError: int() argument must be a string, a bytes-like object or a number, not ‘h5py.h5r.Reference’

I think you’re mixing up Reference and RegionReference. When you create the dataset to store references, use h5py.regionref_dtype instead of h5py.ref_dtype.

I’ve updated my h5py to 2.10.0 with conda install -c conda-forge h5py (Anaconda channel is still on 2.9.0).

Before updating the documentation, it would be nice to also include how to use the references.

Reference usage

First running the above code, and then this:

with h5py.File("sliceref.h5", 'r') as h5_store:
    print(h5_store["ref"][:])

Expectation:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.  18. 19. 20. 21. 22. 23. 24. 25. 26.
27. 28. 29. 30. 31. 32. 33. 34. 35.  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 100. 101.
102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113.  114. 115. 116. 117. 118. 119. 120.
121. 122. 123. 124. 125. 126. 127.  128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139.
140. 141.  142. 143. 144. 145. 146. 147. 148. 149.]

Reality:

[<HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
 <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
 ...
 <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>]

Question

How do I use references to get the output of my expectation? Or should I use a different approach for that?