zarr-python: Incorrect default fill value causes byte arrays to become numeric when `write_empty_chunks=False`
Minimal example:
import zarr
a = zarr.create((1,), dtype=bytes)
a[0] = b''
assert a[0] == b''
Traceback (most recent call last):
File "example.py", line 5, in <module>
assert a[0] == b''
AssertionError
The value of a[0]
is actually 0
, when it should be b''
.
Found by one of our users at https://github.com/tskit-dev/tsinfer/issues/628 this bug was introduced in the latest release (v2.11.0
) in this commit: https://github.com/zarr-developers/zarr-python/commit/f461eb78fbb88187582cd9123d6ec7622d9abd26 when the default for write_empty_chunks
was changed to False
. The default fill_value
for arrays created via zarr.creation.create
is 0
, so when an empty, unwritten chunk is re-created the previous value b''
becomes 0
. I assume this fill_value
should be None
.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 16 (15 by maintainers)
Ah, interesting. If we don’t hear from any objections or alternative proposals, I’d be happy to push that out quickly.
@jni @joshmoore @jakirkham I created a patch in zarr-developers/zarr-python#1001.
I could see this working if we widen the type of the
write_empty_chunks
toUnion[bool, Literal['auto']]
, whereauto
does the behavior you describe