zarr-python: How to prevent Zarr from returning NaN for missing chunks?
Is there a way of preventing Zarr from returning NaNs if a chunk is missing?
Background of my question: We’re seeing problems with either copying data to GCS or with GCS having problems to reliably serve all chunks of a Zarr store.
In arr
below, there’s two types of NaN filled chunks returned by Zarr.
from dask import array as darr
import numpy as np
arr = darr.from_zarr(""gs://pangeo-data/eNATL60-BLBT02X-ssh/sossheig/")
First, there’s a chunk that is completely flagged missing in the data (chunk is over land in an Ocean dataset) but present on GCS (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.0.0) and Zarr correctly find all items marked as invalid:
np.isnan(arr.blocks[0, 0, 0]).mean().compute()
# -> 1.0
Then, there’s a chunk (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.7.3) that is not present (at the time of writing this, I get a “load failed” and a tracking id from GCS) and Zarr returns all items marked invalid as well:
np.isnan(arr.blocks[0, 7, 3]).mean().compute()
# -> 1.0
How do I make Zarr raise an Exception on the latter?
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 1
- Comments: 34 (22 by maintainers)
We were talking about the mapper interface and FSStore, both of which are within zarr-python. fsspec’s exception handling in FSMap is stable, and the question above is how zarr should handle it when given one of these rather than creating it’s own via FSStore (the latter is now the normal path, but the former still works).
I suppose for complete control, you can always do
Also note that some storage backends differentiate between types of error. In particular, referenceFS raises ReferenceNotReachable(RuntimeError) for a key that should be there, because it’s in the reference list, but for some maybe intermittent reason, failed to load.
Edited - HTML text boxes aren’t great for this 😃
I think there’s still a problem in Zarr that needs fixing, whether or not #489 is added.
In the following code
FSStore
will always passon_error="omit"
to fsspec, regardless of the setting ofexceptions
, ormissing_exceptions
:https://github.com/zarr-developers/zarr-python/blob/3db41760e18fb0a69b5066e8c7aba9752a8c474e/zarr/storage.py#L1414-L1421
Not sure what the fix is though…
FYI I tried to use fsspec’s
missing_exceptions
, but it no longer works from commit 4e633ad9aa434304296900790c4c65e0fa0dfa12 onwards.Here’s the code I used to reproduce this:
Did you see https://github.com/zarr-developers/zarr-python/pull/489#issuecomment-823656711, @delgadom? Perhaps give that some testing to help drive it forward?
(was fixed in https://github.com/intake/filesystem_spec/pull/259 )