zarr-python: Incorrect default fill value causes byte arrays to become numeric when `write_empty_chunks=False`

Minimal example:

import zarr
a = zarr.create((1,), dtype=bytes)
a[0] = b''
assert a[0] == b''

Traceback (most recent call last):
  File "example.py", line 5, in <module>
    assert a[0] == b''
AssertionError

The value of a[0] is actually 0, when it should be b''.

Found by one of our users at https://github.com/tskit-dev/tsinfer/issues/628 this bug was introduced in the latest release (v2.11.0) in this commit: https://github.com/zarr-developers/zarr-python/commit/f461eb78fbb88187582cd9123d6ec7622d9abd26 when the default for write_empty_chunks was changed to False. The default fill_value for arrays created via zarr.creation.create is 0, so when an empty, unwritten chunk is re-created the previous value b'' becomes 0. I assume this fill_value should be None.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (15 by maintainers)

Most upvoted comments

Ah, interesting. If we don’t hear from any objections or alternative proposals, I’d be happy to push that out quickly.

@jni @joshmoore @jakirkham I created a patch in zarr-developers/zarr-python#1001.

Ok, so we could have a dict, if the dtype is in the dict, use it, if not, write_empty_chunks goes back to True?

I could see this working if we widen the type of the write_empty_chunks to Union[bool, Literal['auto']], where auto does the behavior you describe