h5netcdf: Files written by h5netcdf cannot be edited by netcdf4-python

What happened:

Files written by h5netcdf cannot be edited by netcdf4-python. It would be good if they could be. I believe the following patch is needed. Would you like me to make the Pull Request?

What you expected to happen:

MCVE Code Sample

import xarray as xr
import numpy as np
dataset = xr.DataArray(
    data=np.zeros((3, 3)),
    name="my_data"
)

engine = "h5netcdf"
# engine = "netcdf4"
dataset.to_netcdf("test.nc", engine=engine, format="NETCDF4")

import netCDF4
nc = netCDF4.Dataset("test.nc", mode="a")
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/tmp/ipykernel_367089/2679834096.py in <module>
      1 import netCDF4
----> 2 nc = netCDF4.Dataset("test.nc", mode="a")

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -103] NetCDF: Can't write file: b'test.nc'

Expected Output

That it works.

Version

Output of <tt>print(h5py.version.info, f"\nh5netcdf {h5netcdf.__version__}")</tt>

Summary of the h5py configuration

h5py 3.6.0 HDF5 1.12.1 Python 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] sys.platform linux sys.maxsize 9223372036854775807 numpy 1.19.5 cython (built with) 0.29.24 numpy (built against) 1.19.5 HDF5 (built against) 1.12.1

h5netcdf 0.12.0

Suggested patch
diff --git a/h5netcdf/core.py b/h5netcdf/core.py
index b68e3f5..5994928 100644
--- a/h5netcdf/core.py
+++ b/h5netcdf/core.py
@@ -462,7 +462,7 @@ class Group(Mapping):
     def _create_child_group(self, name):
         if name in self:
             raise ValueError("unable to create group %r (name already exists)" % name)
-        self._h5group.create_group(name)
+        self._h5group.create_group(name, track_order=True)
         self._groups[name] = self._group_cls(self, name)
         return self._groups[name]
 
@@ -474,7 +474,7 @@ class Group(Mapping):
 
     def create_group(self, name):
         if name.startswith("/"):
-            return self._root.create_group(name[1:])
+            return self._root.create_group(name[1:], track_order=True)
         keys = name.split("/")
         group = self
         for k in keys[:-1]:
@@ -789,15 +789,15 @@ class File(Group):
                             "opening urls: {}".format(path)
                         )
                     try:
-                        with h5pyd.File(path, "r") as f:  # noqa
+                        with h5pyd.File(path, "r", track_order=True) as f:  # noqa
                             pass
                         self._preexisting_file = True
                     except IOError:
                         self._preexisting_file = False
-                    self._h5file = h5pyd.File(path, mode, **kwargs)
+                    self._h5file = h5pyd.File(path, mode, track_order=True, **kwargs)
                 else:
                     self._preexisting_file = os.path.exists(path) and mode != "w"
-                    self._h5file = h5py.File(path, mode, **kwargs)
+                    self._h5file = h5py.File(path, mode, track_order=True, **kwargs)
             else:  # file-like object
                 if version.parse(h5py.__version__) < version.parse("2.9.0"):
                     raise TypeError(
@@ -806,7 +806,7 @@ class File(Group):
                     )
                 else:
                     self._preexisting_file = mode in {"r", "r+", "a"}
-                    self._h5file = h5py.File(path, mode, **kwargs)
+                    self._h5file = h5py.File(path, mode, track_order=True, **kwargs)
         except Exception:
             self._closed = True
             raise

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24 (13 by maintainers)

Most upvoted comments

To summarize the issue you pointed to:

netCDF4 wants the track_order parameter of the h5py constructors to be true though this might be getting relaxed in a more recent version of the standard, it seems like it is a little far away for now.

Let me know if you think anything is missing from https://github.com/h5netcdf/h5netcdf/pull/129

Happy to hold off on any discussion there while we wait for shoyer to chime in here.

@hmaarrfk A first test writing with track_order=False and reading with track_order=True gives no problems. It looks like h5netcdf is track_order agnostic somehow.

It would be good to parameterize it, with default to track_order=True. Not sure if we need a deprecation cycle if this has no effects. The only effect I can think of is that some objects are repr’d in another order. But if this new ordering is in line with Python dict I would just give it a go. But I’d like to hear @shoyer’s suggestions how to proceed here.

Sure adding tests for checking different read/write combinations would be good.