MetaPhlAn: [BUG] database installation error

After installing metaphlan 3.0, and activating its conda environment, I ran:

metaphlan --install

This produces the following error:

File /tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan_databases/file_list.txt already present!
Traceback (most recent call last):
  File "/tigress/MOLBIO/local/pythonenv/metaphlan3/bin/metaphlan", line 10, in <module>
    sys.exit(main())
  File "/tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan.py", line 1187, in main
    pars['index'] = check_and_install_database(pars['index'], pars['bowtie2db'], pars['bowtie2_build'], pars['nproc'], pars['force_download'])
  File "/tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan.py", line 610, in check_and_install_database
    download_unpack_tar(FILE_LIST, index, bowtie2_db, bowtie2_build, nproc)
  File "/tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan.py", line 463, in download_unpack_tar
    url_tar_file = ls_f["mpa_" + download_file_name + ".tar"]
KeyError: 'mpa_mpa_v30_CHOCOPhlAn_201901.tar'

Metaphlan was installed like this:

conda create -p /path/to/our/conda/envs/metaphlan3 -c bioconda metaphlan

The problem appears to be that in metaphlan.py, “mpa_” is getting prepended to the database names when they are used as keys in the ls_f dictionary. Removing these extra “mpa_” strings seems to solve the problem, like so:

diff metaphlan.py-orig metaphlan.py
462,463c462,463
<     tar_file = os.path.join(folder, "mpa_" + download_file_name + ".tar")
<     url_tar_file = ls_f["mpa_" + download_file_name + ".tar"]
---
>     tar_file = os.path.join(folder, download_file_name + ".tar")
>     url_tar_file = ls_f[download_file_name + ".tar"]
467,468c467,468
<     md5_file = os.path.join(folder, "mpa_" + download_file_name + ".md5")
<     url_md5_file = ls_f["mpa_" + download_file_name + ".md5"]
---
>     md5_file = os.path.join(folder, download_file_name + ".md5")
>     url_md5_file = ls_f[download_file_name + ".md5"]

The “file_list.txt already present” message appears not to be the real problem.

Best, Matthew Cahn

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 34 (13 by maintainers)

Most upvoted comments

Yep, on second pass @fbeghini , I have to say that you instructions were indeed complete! It was my unfamiliarity with bioconda that was the problem.

The build recipe that worked was:

Bootstrap:docker
From: continuumio/miniconda3

%environment
    PATH=/opt/conda/bin:/bin:/usr/bin

%post
    export PATH="/opt/conda/bin:$PATH"
    conda update conda
    conda update --all
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge
    conda install -c bioconda metaphlan=3.0=pyh5ca1d4c_4
    metaphlan --install

Updated with a cleaner build recipe

Yes, but you have to put the six pipes before e.g. ||||||39491 since the tree object expect will split the full taxonomy string according the pipe character.

I’m working on the CI for MetaPhlAn for testing also if the database is OK, it will be ready in a couple of weeks

Thanks for the reply. I had not added the channels as instructed, because I though I already had those channels. I added them (in the order listed), deleted the previous environment, made a new one, and ran the same installation again. This time it installed Python 3.7 and metaphlan build pyh5ca1d4c_4, and the database download works.

Best, Matthew