cellxgene: Error with launching cellxgene after manually adding metadata to AnnData file (.h5ad)

Hi,

I have an AnnData file adata.h5ad file with certain columns in adata.obs. I have some more metadata that I would like to add to adata.obs. I follow some simple steps and recommendations provided online to add these columns to adata.obs

# df_annot - pandas dataframe with annotations/columns I would like to add
# df_annot has identical index as adata.obs
adata.obs = pd.concat([adata.obs,df_annot],axis=1)

Note: I am using cellxgene==0.14.1 The launching of original adata.h5ad using cellxgene works using the following command - cellxgene launch adata.obs --port 8000 --disable-diffexp

However, when I try to launch adata.h5ad after having added additional columns (metadata) to adata.obs using the same command, I get the following error -

Error: 'dict' object has no attribute 'dtype' - file not found or is inaccessible.  File must be an .h5ad object.  Please check your input and try again.

Initially, I thought there might be an issue with how I am adding metadata/columns to AnnData file - adata.obs. I explored further by re-reading the modified h5ad file using scanpy (sc.read_h5ad) and processing it further by performing steps like ranking gene groups (sc.tl.rank_genes_groups). These steps were successful, so it seems unlikely that this is an AnnData (.h5ad) file issue.

Would love to hear your thoughts on where the issue might be arising from and what would be the best way to debug and getting it to work.

cellxgene installed in a conda environment with the following dependency versions -

python==3.7.3
numpy==1.16.3
pandas==0.24.2
numba==0.43.1
cellxgene==0.14.1
anndata==0.6.22post1
scanpy==1.4.5.1

Update (Mar 10, 2020):

To clarify, I am using different environments for cellxgene, and processing the AnnData file. CELLXGENE CONDA ENVIRONMENT

python==3.7.3
numpy==1.16.3
pandas==0.24.2
numba==0.43.1
cellxgene==0.14.1
anndata==0.6.22post1

ANNDATA PROCESSING CONDA ENVIRONMENT

python==3.8.1
numpy==1.18.1
pandas==1.0.1
numba==0.48.0
anndata==0.7.1
scanpy==1.4.5.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

I believe I understand the issue for @GMaciag, and it may very well be the issue for @shahnirav1005

Background: scanpy and anndata provide backward compatibility, but not forward compatibility. H5AD created by old anndata versions can be read by newer versions, but not vice versa.

In other words:

  • H5AD created with anndata version “N” are readable by anndata version “N+1” (and later)
  • H5AD created by anndata version “N+1” are not normally readable by anndata version “N”

In @GMaciag case, you are using a more recent anndata to create the data, and reading it with an old one.

cellxgene 0.14.1 mandates a slightly older version because the latest anndata was incompatible with cellxgene. So we pinned the version dependency.

You can fix this in two ways:

  1. Create the H5AD using the 0.6.22post1 version of anndata (and correspondingly older scanpy)
  2. Wait a few days, and we will release a new version of cellxgene that fixes this (works with the latest anndata/scanpy).

Apologies for the inconvenience!

We will announce the release on the CZI Science slack site (#cellxgene-users channel)

@shahnirav1005 - I believe you have correctly diagnosed the issue. See my above comment about a full fix coming in a couple of days.

Hi,

  • Note: This comment is written on Mar 10, 2020
  • Note: I am using cellxgene==0.14.1

After looking at @GMaciag’s pip list and reading some comments on #cellxgene-users channel on cziscience Slack, I thought this might be a scanpy-anndata version compatibility issue, and not really about me adding metadata to adata.obs.

To reiterate, I use different environments for launching cellxgene and processing the AnnData file. I have updated this remark in my original question.

I am using the latest Scanpy - scanpy==1.4.5.1, which has a requirement of anndata>=0.7, and hence anndata==0.7.1 is automatically installed on the installation of scanpy 1.4.5.1 (pip installation).

I noticed that when I had initially processed the files, my pip list included scanpy==1.4.5.1 and anndata==0.6.22.post1 (even though these versions of scanpy and anndata are supposed to be incompatible with each other). When I processed my AnnData with these versions, I was able to successfully launch the AnnData files using cellxgene.

I removed anndata==0.7.1, and installed anndata==0.6.22.post1. Added metadata using the following steps

# df_annot - pandas dataframe with annotations/columns I would like to add
# df_annot has identical index as adata.obs
adata.obs = pd.concat([adata.obs,df_annot],axis=1)

and was able to successfully launch these files using cellxgene. This might be a temporary fix (as scanpy==1.4.5.1 and anndata==0.6.22.post1 are supposed to be incompatible) to be able to launch AnnData files on cellxgene - by processing them with anndata==0.6.22.post1, which also happens to cellxgene’s current dependency.

@GMaciag Can you confirm this by replacing anndata==0.7.1 with anndata==0.6.22.post1, processing your dataset and verifying if you are able to launch it using cellxgene?