anndata: AnnData cannot open file that was opened with JHDF5 before
Please make sure these conditions are met
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of anndata.
- (optional) I have confirmed this bug exists on the master branch of anndata.
Report
I noticed that AnnData throws an error while trying to read a .h5ad
file that was opened with JHDF5 (with write permissions) before. This means that AnnData is currently not able to open HDF5 files created from a Java program, even if the files conform to the AnnData on-disk format.
The problem most likely occurs because JHDF5 adds a /__DATA_TYPES__
group (presumably for internal reasons), which can be seen by using h5dump
before and after the access from Java and comparing with diff
. AnnData tries to read that group, but fails because the datasets stored in this group have no valid AnnData-encoding type. I guess that this problem can be circumvented by making AnnData only read groups that are part of its on-disk schema, i.e., X, layers, uns, obs[m|p], var[m|p]
.
Steps to reproduce
This is a minimal python program failing on the last line if the file was opened from the Java-side in between write and read from the Python-side.
import numpy as np
import anndata as ad
adata = ad.AnnData(np.zeros((2,2)))
adata.write('test.h5ad')
# There is a conditional error on this line:
# * without external interference: works
# * if opened with JHDF5 before: throws error
bdata = ad.read('test.h5ad')
Also, here is Java program causing the last line to fail. Note that it doesn’t change the file, but just opens it with write permissions.
import ch.systemsx.cisd.hdf5.HDF5Factory;
public class App {
public static void main(String[] args) {
HDF5Factory.open("test.h5ad");
}
}
To get the necessary dependencies and execute the Java file, I suggest using maven with this pom.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.scijava</groupId>
<artifactId>pom-scijava</artifactId>
<version>32.0.0-beta-5</version>
<relativePath />
</parent>
<groupId>mwe</groupId>
<artifactId>mwe</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>cisd</groupId>
<artifactId>jhdf5</artifactId>
</dependency>
</dependencies>
<repositories>
<repository>
<id>scijava.public</id>
<url>https://maven.scijava.org/content/groups/public</url>
</repository>
</repositories>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
Traceback
Traceback (most recent call last):
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 202, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 230, in read_elem
read_func = self.registry.get_reader(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 143, in get_reader
raise IORegistryError._from_read_parts(
anndata._io.specs.registry.IORegistryError: No read method registered for IOSpec(encoding_type='', encoding_version='') from <class 'h5py._hl.datatype.Datatype'>. You may need to update your installation of anndata.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 243, in read_h5ad
adata = read_dispatched(f, callback=callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/experimental/__init__.py", line 58, in read_dispatched
return reader.read_elem(elem)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 204, in func_wrapper
re_raise_error(e, elem)
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 185, in re_raise_error
raise e
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 202, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 235, in read_elem
return self.callback(read_func, elem.name, elem, iospec=get_spec(elem))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 224, in callback
**{
^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 227, in <dictcomp>
k: read_dispatched(elem[k], callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/experimental/__init__.py", line 58, in read_dispatched
return reader.read_elem(elem)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 204, in func_wrapper
re_raise_error(e, elem)
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 185, in re_raise_error
raise e
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 202, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 235, in read_elem
return self.callback(read_func, elem.name, elem, iospec=get_spec(elem))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 241, in callback
return func(elem)
^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 94, in read_basic
return {k: _reader.read_elem(v) for k, v in elem.items()}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 94, in <dictcomp>
return {k: _reader.read_elem(v) for k, v in elem.items()}
^^^^^^^^^^^^^^^^^^^^
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 204, in func_wrapper
re_raise_error(e, elem)
File "/home/innerbergerm@hhmi.org/Software/mambaforge/envs/anndata/lib/python3.11/site-packages/anndata/_io/utils.py", line 188, in re_raise_error
raise AnnDataReadError(
anndata._io.utils.AnnDataReadError: Above error raised while reading key '/__DATA_TYPES__/Enum_Boolean' of type <class 'h5py._hl.datatype.Datatype'> from /.
Versions
anndata 0.9.1
session_info 1.0.0
-----
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
Linux-5.19.0-46-generic-x86_64-with-glibc2.35
-----
Session information updated at 2023-07-11 05:25
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (9 by maintainers)
It looks like some other users of this library do that:
But searching for “
hdf5 __DATA_TYPES__
” mostly looks like people trying to work around this being added.Thanks for the prompt feedback!
This is the file generated by the Python mwe above after accessing it with JHDF5. The difference in the output of
h5dump
is shown in the following.This more or less matches how you reproduced the issue. Your solution is a great drop-in replacement for reading these kinds of files, thanks a lot! It would be great, though, if this could be the default behavior in future versions to facilitate collaboration across language boundaries. Does this being in the
ad.experimental
namespace mean that this is already planned for a future release?I can reproduce with:
You can get around this right now with: