CellBender: 10x H5 Format Error

On most recent commit “c051c44” on v2, the output .h5 files can’t be read by scanpy. Seems to think there’s a key error on ‘genome’?

`KeyError Traceback (most recent call last) ~/utils/miniconda3/envs/scanpy/lib/python3.7/site-packages/scanpy/readwrite.py in _read_v3_10x_h5(filename, start) 253 feature_types=dsets[‘feature_type’].astype(str), –> 254 genome=dsets[‘genome’].astype(str), 255 ),

KeyError: ‘genome’

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last) <timed exec> in <module>

~/utils/miniconda3/envs/scanpy/lib/python3.7/site-packages/scanpy/readwrite.py in read_10x_h5(filename, genome, gex_only) 159 v3 = ‘/matrix’ in f 160 if v3: –> 161 adata = _read_v3_10x_h5(filename, start=start) 162 if genome: 163 if genome not in adata.var[‘genome’].values:

~/utils/miniconda3/envs/scanpy/lib/python3.7/site-packages/scanpy/readwrite.py in _read_v3_10x_h5(filename, start) 258 return adata 259 except KeyError: –> 260 raise Exception(‘File is missing one or more required datasets.’) 261 262

Exception: File is missing one or more required datasets.`

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (10 by maintainers)

Most upvoted comments

Ah - I think the error occurs when one is running CellBender using the CellRanger matrix format (a dir with barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz) rather than h5 format as input. I am testing out using the CellRanger h5 file now and will report back.

I also ran into this problem. Maybe a good solution would be to always add a genome, just empty strings if we don’t know what it was? If you support this decision, I’d be happy to sketch a pull-request

For now I ended up fixing cellbender h5 files with this:

tables.copy_file(orig_h5, fixed_h5)
with tables.open_file(fixed_h5, "r+") as f:
    n = f.get_node("/matrix/features")
    n_genes = f.get_node("/matrix/shape")[0]
    if "genome" not in n:
        f.create_array(n, "genome", np.repeat("GRCh38", n_genes))

Yes - this is it. I can confirm loading the CellBender output in scanpy works just fine when I run CellBender using the h5 input format.

Thank you, Stephen. I just want to clarify, that the problem for me was not in v2 CellRanger, but in v3 CellRanger that started with mtx folder. In the folder, there’s no record of genome.

I also used mtx fille generated from h5ad as input, since I do not have CellRanger file.

@sjfleming thank you for posting this, it is appreciated. Will keep you posted on the progress.