xarray: `to_zarr` with append or region mode and `_FillValue` doesnt work

What happened?

import numpy as np
import xarray as xr
ds = xr.Dataset({"a": ("x", [3.], {"_FillValue": np.nan})})
m = {}
ds.to_zarr(m)
ds.to_zarr(m, append_dim="x")

raises

ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

What did you expect to happen?

I’d expect this to just work (effectively concatenating the dataset to itself).

Anything else we need to know?

appears also for region writes

The same issue appears for region writes as in:

import numpy as np
import dask.array as da
import xarray as xr
ds = xr.Dataset({"a": ("x", da.array([3.,4.]), {"_FillValue": np.nan})})
m = {}
ds.to_zarr(m, compute=False, encoding={"a": {"chunks": (1,)}})
ds.isel(x=slice(0,1)).to_zarr(m, region={"x": slice(0,1)})

raises

ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

there’s a workaround

The workaround (deleting the _FillValue in subsequent writes):

m = {}
ds.to_zarr(m)
del ds.a.attrs["_FillValue"]
ds.to_zarr(m, append_dim="x")

seems to do the trick.

There are indications that the result might still be broken, but it’s not yet clear how to reproduce them (see comments below).

This issue has been split off from #6069

Environment

INSTALLED VERSIONS

commit: None python: 3.9.10 (main, Jan 15 2022, 11:48:00) [Clang 13.0.0 (clang-1300.0.29.3)] python-bits: 64 OS: Darwin OS-release: 20.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: (‘de_DE’, ‘UTF-8’) libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.20.1 pandas: 1.2.0 numpy: 1.21.2 scipy: 1.6.2 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.11.0 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2021.11.1 distributed: 2021.11.1 matplotlib: 3.4.1 cartopy: 0.20.1 seaborn: 0.11.1 numbagg: None fsspec: 2021.11.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 60.5.0 pip: 21.3.1 conda: None pytest: 6.2.2 IPython: 8.0.0.dev sphinx: 3.5.0

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Thanks for pointing out region again. I’ve updated the header and the initial comment.

Yes, this is kind of the behaviour I’d expect. And great that it helped clarifying things. Still, building up the metadata nicely upfront (which is required for region writes) ist quite convoluted… That’s what I meant with

some better tooling for writing and updating zarr dataset metadata (I don’t know if that would fit in the realm of xarray though, as it looks like handling Datasets without content. For “appending” metadata, I really don’t know how I’d picture this propery in xarray world.)

in the previous comment. I think, establishing and documenting good practices for this would help, but probably we also want to have better tools. In any case, this would probably be yet another issue.

Note that if you care about this paricular example (e.g. appending in a single thread in increasing order of timesteps), then it should also be possible to do this much simpler using append:

filename='processed_dataset.zarr'
ds = xr.tutorial.open_dataset('air_temperature')
ds.air.encoding['dtype']=np.dtype('float32')
X,Y=250, 250 #size of each final timestep

for i in range(len(ds.time)):
    # some kind of heavy processing
    arr_r=some_processing(ds.isel(time=slice(i,i+1)),X,Y)
    del arr_r.air.attrs["_FillValue"]
    if os.path.exists(filename):
        arr_r.to_zarr(filename, append_dim='time')
    else:
        arr_r.to_zarr(filename)

If you find out more about the cloud case, please post a note, otherwise, we can assume that the original bug report is fine?