scanpy: setting gene_symbol to select symbol from adata.var fails in sc.pl.umap()

I would like to color the umap representation using gene expression values. For ease of use I’d like to display the Gene name instead of gene_id which are the adata.var_names in my case. Setting gene_symbols = 'Symbol' doesn’t seem to work for me or I am using it the wrong way.

When running sc.pl.umap(adata, gene_symbols = 'Symbol', color = ['Tnnt2'])

I get the follwoing error message:

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-116-e09d49f2528c> in <module>
----> 1 sc.pl.umap(adata, gene_symbols = 'Symbol', color = ['Tnnt2'])

/anaconda3/envs/scanpy/lib/python3.6/site-packages/scanpy/plotting/tools/scatterplots.py in umap(adata, **kwargs)
     27     If `show==False` a `matplotlib.Axis` or a list of it.
     28     """
---> 29     return plot_scatter(adata, basis='umap', **kwargs)
     30 
     31 

/anaconda3/envs/scanpy/lib/python3.6/site-packages/scanpy/plotting/tools/scatterplots.py in plot_scatter(adata, color, use_raw, sort_order, edges, edges_width, edges_color, arrows, arrows_kwds, basis, groups, components, projection, color_map, palette, size, frameon, legend_fontsize, legend_fontweight, legend_loc, ncols, hspace, wspace, title, show, save, ax, return_fig, **kwargs)
    275         color_vector, categorical = _get_color_values(adata, value_to_plot,
    276                                                       groups=groups, palette=palette,
--> 277                                                       use_raw=use_raw)
    278 
    279         # check if higher value points should be plot on top

/anaconda3/envs/scanpy/lib/python3.6/site-packages/scanpy/plotting/tools/scatterplots.py in _get_color_values(adata, value_to_plot, groups, palette, use_raw)
    665         raise ValueError("The passed `color` {} is not a valid observation annotation "
    666                          "or variable name. Valid observation annotation keys are: {}"
--> 667                          .format(value_to_plot, adata.obs.columns))
    668 
    669     return color_vector, categorical

ValueError: The passed `color` Tnnt2 is not a valid observation annotation or variable name. Valid observation annotation keys are: Index(['Sample', 'n_counts', 'n_genes', 'percent_mito', 'log_counts',
       'louvain'],
      dtype='object')

adata.var contains the column “Symbol” and “Tnnt2” is present:

adata.var[adata.var['Symbol'] == 'Tnnt2']

Symbol type highly_variable means dispersions dispersions_norm
Tnnt2 protein_coding True 0.923869 4.090601 11.370244

run with: scanpy==1.3.7 anndata==0.6.17 numpy==1.14.6 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.19.1 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (10 by maintainers)

Most upvoted comments

Yes (definitely) and yes (I think)

Passing an argument for gene_symbols means that instead of searching .var_names, the column of .var whose name was passed will be searched.

For example, if you had an AnnData object adata with ensembl ids as adata.var_name, and hgnc symbols under the column adata.var[“gene_name”], the following calls should plot similar things (different titles):

sc.pl.umap(adata, color=[“ENSG00000261371”])
sc.pl.umap(adata, color=[“PECAM1”], gene_symbols=“gene_name”)

adata.var["gene_name"]

Traceback (most recent call last): File “/sc/arion/work/gujarh01/software/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py”, line 3361, in get_loc return self._engine.get_loc(casted_key) File “pandas/_libs/index.pyx”, line 76, in pandas._libs.index.IndexEngine.get_loc File “pandas/_libs/index.pyx”, line 108, in pandas._libs.index.IndexEngine.get_loc File “pandas/_libs/hashtable_class_helper.pxi”, line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File “pandas/_libs/hashtable_class_helper.pxi”, line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: ‘gene_name’

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “/sc/arion/work/gujarh01/software/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py”, line 3458, in getitem indexer = self.columns.get_loc(key) File “/sc/arion/work/gujarh01/software/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py”, line 3363, in get_loc raise KeyError(key) from err KeyError: ‘gene_name’

Did something change in scanpy ?