zarr-python: How to prevent Zarr from returning NaN for missing chunks?

Is there a way of preventing Zarr from returning NaNs if a chunk is missing?

Background of my question: We’re seeing problems with either copying data to GCS or with GCS having problems to reliably serve all chunks of a Zarr store.

In arr below, there’s two types of NaN filled chunks returned by Zarr.

from dask import array as darr
import numpy as np

arr = darr.from_zarr(""gs://pangeo-data/eNATL60-BLBT02X-ssh/sossheig/")

First, there’s a chunk that is completely flagged missing in the data (chunk is over land in an Ocean dataset) but present on GCS (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.0.0) and Zarr correctly find all items marked as invalid:

np.isnan(arr.blocks[0, 0, 0]).mean().compute()
# -> 1.0

Then, there’s a chunk (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.7.3) that is not present (at the time of writing this, I get a “load failed” and a tracking id from GCS) and Zarr returns all items marked invalid as well:

np.isnan(arr.blocks[0, 7, 3]).mean().compute()
# -> 1.0

How do I make Zarr raise an Exception on the latter?

cc: @auraoupa related: pangeo-data/pangeo#691

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 1
  • Comments: 34 (22 by maintainers)

Most upvoted comments

We were talking about the mapper interface and FSStore, both of which are within zarr-python. fsspec’s exception handling in FSMap is stable, and the question above is how zarr should handle it when given one of these rather than creating it’s own via FSStore (the latter is now the normal path, but the former still works).

I suppose for complete control, you can always do

fsm = FSMap(..., missing_exceptions=())
store = FSStore(fsm, missing_exceptions=(...))  # check call, I'm not sure how to pass this
group = zarr.open(store)

Also note that some storage backends differentiate between types of error. In particular, referenceFS raises ReferenceNotReachable(RuntimeError) for a key that should be there, because it’s in the reference list, but for some maybe intermittent reason, failed to load.

Edited - HTML text boxes aren’t great for this 😃

I’m a bit confused after reading this issue if improvements are still needed in both fsspec and zarr-python or just zarr-python (e.g., #489). Is anyone able to clarify whether #489 would be expected to work, or if #486 (comment) is blocking that PR? For reference, I believe this could help us solve issues with accessing our downscaled CMIP6 data on Planetary Computer (e.g., carbonplan/cmip6-downscaling#323).

I think there’s still a problem in Zarr that needs fixing, whether or not #489 is added.

In the following code FSStore will always pass on_error="omit" to fsspec, regardless of the setting of exceptions, or missing_exceptions:

https://github.com/zarr-developers/zarr-python/blob/3db41760e18fb0a69b5066e8c7aba9752a8c474e/zarr/storage.py#L1414-L1421

Not sure what the fix is though…

FYI I tried to use fsspec’s missing_exceptions, but it no longer works from commit 4e633ad9aa434304296900790c4c65e0fa0dfa12 onwards.

Here’s the code I used to reproduce this:

import fsspec
import zarr

fs = fsspec.filesystem("file")

# create an array with no chunks on disk
mapper = fs.get_mapper("tmp.zarr")
za = zarr.open(mapper, mode="w", shape=(3, 3), chunks=(2, 2))

# ensure no exceptions are converted to KeyError
mapper = fs.get_mapper("tmp.zarr", missing_exceptions=())

# following should fail since chunks are missing
print(zarr.open(mapper, mode="r")[:])

Did you see https://github.com/zarr-developers/zarr-python/pull/489#issuecomment-823656711, @delgadom? Perhaps give that some testing to help drive it forward?