xarray: Xarray does not support full range of netcdf-python compression options
What is your issue?
Summary
The netcdf4-python API docs say the following
If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently
zlib
,szip
,zstd
,bzip2
,blosc_lz
,blosc_lz4
,blosc_lz4hc
,blosc_zlib
andblosc_zstd
are supported. Default is None (no compression). All of the compressors exceptzlib
andszip
use the HDF5 plugin architecture.If the optional keyword
zlib
is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor ofcompression='zlib'
.
Although compression
is considered a valid encoding option by Xarray
…it appears that we silently ignores the compression
option when creating new netCDF4 variables:
Code example
shape = (10, 20)
chunksizes = (1, 10)
encoding = {
'compression': 'zlib',
'shuffle': True,
'complevel': 8,
'fletcher32': False,
'contiguous': False,
'chunksizes': chunksizes
}
da = xr.DataArray(
data=np.random.rand(*shape),
dims=['y', 'x'],
name="foo",
attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()
fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")
with xr.open_dataset(fname, engine="netcdf4") as ds1:
display(ds1.foo.encoding)
{'zlib': False,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 0,
'fletcher32': False,
'contiguous': False,
'chunksizes': (1, 10),
'source': 'test.nc',
'original_shape': (10, 20),
'dtype': dtype('float64'),
'_FillValue': nan}
In addition to showing that compression
is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip
, zstd
, bzip2
, blosc
).
Proposal
We should align with the recommendation from the netcdf4 docs and support compression=
style encoding in NetCDF. We should deprecate zlib=True
syntax.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22 (14 by maintainers)
Commits related to this issue
- Fix CF tests due to new xarray release This is due to https://github.com/pydata/xarray/issues/7388 not being solved yet. — committed to mraspaud/satpy by mraspaud a year ago
Thanks. It looks like the errors are related to this bug https://github.com/Unidata/netcdf-c/issues/2674 The fix has been merged so I hope they include it in the next netcdf-c release. For the moment I prefer not to merge this as netcdf 4.9.2 and dask do not seem to play well together.
We are eagerly waiting for this issue to be solved 😃 Is there anything we can do to help?