xarray: Xarray does not support full range of netcdf-python compression options

What is your issue?

Summary

The netcdf4-python API docs say the following

If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture.

If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of compression='zlib'.

Although compression is considered a valid encoding option by Xarray

https://github.com/pydata/xarray/blob/bbe63ab657e9cb16a7cbbf6338a8606676ddd7b0/xarray/backends/netCDF4_.py#L232-L242

…it appears that we silently ignores the compression option when creating new netCDF4 variables:

https://github.com/pydata/xarray/blob/bbe63ab657e9cb16a7cbbf6338a8606676ddd7b0/xarray/backends/netCDF4_.py#L488-L501

Code example

shape = (10, 20)
chunksizes = (1, 10)

encoding = {
    'compression': 'zlib',
    'shuffle': True,
    'complevel': 8,
    'fletcher32': False,
    'contiguous': False,
    'chunksizes': chunksizes
}

da = xr.DataArray(
    data=np.random.rand(*shape),
    dims=['y', 'x'],
    name="foo",
    attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()

fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")

with xr.open_dataset(fname, engine="netcdf4") as ds1:
    display(ds1.foo.encoding)
{'zlib': False,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 0,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 10),
 'source': 'test.nc',
 'original_shape': (10, 20),
 'dtype': dtype('float64'),
 '_FillValue': nan}

In addition to showing that compression is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip, zstd, bzip2, blosc).

Proposal

We should align with the recommendation from the netcdf4 docs and support compression= style encoding in NetCDF. We should deprecate zlib=True syntax.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks. It looks like the errors are related to this bug https://github.com/Unidata/netcdf-c/issues/2674 The fix has been merged so I hope they include it in the next netcdf-c release. For the moment I prefer not to merge this as netcdf 4.9.2 and dask do not seem to play well together.

We are eagerly waiting for this issue to be solved 😃 Is there anything we can do to help?