xpublish: Encoding chunks do not match inferred chunks

Hi,

I am having problems with the/zarr/get_zarr_metadata endpoint. I can start the server and I see my dataset but when I try to read data from the client side, I get ValueError: Encoding chunks do not match inferred chunks.

I tried to explicitly change the chunks / encoding but it did not seem to work.

My code looks something like this:

import zarr
import xarray as xr
from azure.storage.blob import ContainerClient

container_client = ContainerClient("some_url", container_name="some_name", credential="some_credentials")
store = zarr.ABSStore(client=container_client, prefix="file.zarr")

ds = xr.open_zarr(store, consolidated=True, overwrite_encoded_chunks=True)  # do I need overwrite_encoded_chunks?
ds = ds.chunk({"time": 2**12, "feature_id": 2**16})  # do I need this?
ds.encoding = {"time": 2**12, "feature_id": 2**16} # do I need this?

Any ideas?

Thank you very much

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Thanks @xaviernogueira that let me dig into it some. https://gist.github.com/abkfenris/23fe268eb3f3479919a267efe392e4a5

I didn’t end up trying with requester pays (or the OSN that works for that matter), but I was able to reproduce the ValueError with the OSN dataset.

It looks like an encoding may be set on time, even though it’s a numpy array rather than a dask array under the hood, which appears to be causing a mismatch.

If I yoink the .zmetadata direct from OSN, it’s got chunks on time/.zarray

{
    "metadata": {
        ...
        "time/.zarray": {
            "chunks": [
                46008
            ],
            "compressor": {
                "id": "zstd",
                "level": 9
            },
            "dtype": "<i8",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                368064
            ],
            "zarr_format": 2
        },
        "time/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "time"
            ],
            "calendar": "proleptic_gregorian",
            "standard_name": "time",
            "units": "hours since 1979-10-01 00:00:00"
        },
        ...
    }
}

This is probably over my head for Zarr specifics so I’m not sure if we should go for the encoded/inferred chunks in this case, but maybe @jhamman has some thoughts.