MetaPhlAn: [BUG] database installation error
After installing metaphlan 3.0, and activating its conda environment, I ran:
metaphlan --install
This produces the following error:
File /tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan_databases/file_list.txt already present!
Traceback (most recent call last):
File "/tigress/MOLBIO/local/pythonenv/metaphlan3/bin/metaphlan", line 10, in <module>
sys.exit(main())
File "/tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan.py", line 1187, in main
pars['index'] = check_and_install_database(pars['index'], pars['bowtie2db'], pars['bowtie2_build'], pars['nproc'], pars['force_download'])
File "/tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan.py", line 610, in check_and_install_database
download_unpack_tar(FILE_LIST, index, bowtie2_db, bowtie2_build, nproc)
File "/tigress/MOLBIO/local/pythonenv/metaphlan3/lib/python3.6/site-packages/metaphlan/metaphlan.py", line 463, in download_unpack_tar
url_tar_file = ls_f["mpa_" + download_file_name + ".tar"]
KeyError: 'mpa_mpa_v30_CHOCOPhlAn_201901.tar'
Metaphlan was installed like this:
conda create -p /path/to/our/conda/envs/metaphlan3 -c bioconda metaphlan
The problem appears to be that in metaphlan.py, “mpa_” is getting prepended to the database names when they are used as keys in the ls_f dictionary. Removing these extra “mpa_” strings seems to solve the problem, like so:
diff metaphlan.py-orig metaphlan.py
462,463c462,463
< tar_file = os.path.join(folder, "mpa_" + download_file_name + ".tar")
< url_tar_file = ls_f["mpa_" + download_file_name + ".tar"]
---
> tar_file = os.path.join(folder, download_file_name + ".tar")
> url_tar_file = ls_f[download_file_name + ".tar"]
467,468c467,468
< md5_file = os.path.join(folder, "mpa_" + download_file_name + ".md5")
< url_md5_file = ls_f["mpa_" + download_file_name + ".md5"]
---
> md5_file = os.path.join(folder, download_file_name + ".md5")
> url_md5_file = ls_f[download_file_name + ".md5"]
The “file_list.txt already present” message appears not to be the real problem.
Best, Matthew Cahn
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 34 (13 by maintainers)
Yep, on second pass @fbeghini , I have to say that you instructions were indeed complete! It was my unfamiliarity with bioconda that was the problem.
The build recipe that worked was:
Updated with a cleaner build recipe
Yes, but you have to put the six pipes before e.g.
||||||39491
since the tree object expect will split the full taxonomy string according the pipe character.I’m working on the CI for MetaPhlAn for testing also if the database is OK, it will be ready in a couple of weeks
Thanks for the reply. I had not added the channels as instructed, because I though I already had those channels. I added them (in the order listed), deleted the previous environment, made a new one, and ran the same installation again. This time it installed Python 3.7 and metaphlan build pyh5ca1d4c_4, and the database download works.
Best, Matthew