pharokka: Error calling tRNAscan-SE
phrokka version: v1.4.1 Python version: Python 3.10.12 Operating System: Computing cluster
Description
I have run prodigal-gv on my viral genomes to get the ORFs (it also outputs a genbank formatted CSV file). Using the predicted ORFs as input I want to run Pharroka to annotate them (without running prodigal or Phannotate again).
What I Did
I run the following command:
/home2/p304845/Conda_envs/Pharokka_conda/bin/pharokka.py \
-i Genbank_formatted_CSV \
--genbank \
-o $output_dir \
-d /home2/p304845/Conda_envs/Pharokka_conda/databases \
-t ${SLURM_CPUS_PER_TASK}
And I got the following error:
2023-09-04 17:06:08.757 | INFO | __main__:main:84 - Starting Pharokka v1.4.1
2023-09-04 17:06:08.759 | INFO | __main__:main:85 - Command executed: Namespace(infile='/scratch/p304845/Annotation/Pharokka_annotation/viruses_present_output.csv', outdir='/scratch/p304845/Annotation/Pharokka_annotation/Results', database='/home2/p304845/Conda_envs/Pharokka_conda/databases', threads='24', force=False, prefix='Default', locustag='Default', gene_predictor='phanotate', meta=False, split=False, coding_table='11', evalue='1E-05', fast=False, mmseqs2_only=False, meta_hmm=False, dnaapler=False, custom_hmm='', genbank=True, terminase=False, terminase_strand='nothing', terminase_start='nothing', citation=False)
2023-09-04 17:06:08.759 | INFO | __main__:main:86 - Repository homepage is https://github.com/gbouras13/pharokka
2023-09-04 17:06:08.759 | INFO | __main__:main:87 - Written by George Bouras: george.bouras@adelaide.edu.au
2023-09-04 17:06:08.759 | INFO | __main__:main:89 - Checking database installation in /home2/p304845/Conda_envs/Pharokka_conda/databases.
2023-09-04 17:06:08.767 | INFO | __main__:main:92 - All databases have been successfully checked.
2023-09-04 17:06:08.767 | INFO | __main__:main:108 - Checking dependencies.
2023-09-04 17:06:09.540 | INFO | input_commands:check_dependencies:322 - Phanotate version found is v1.5.1
2023-09-04 17:06:09.541 | INFO | input_commands:check_dependencies:331 - Phanotate version is ok.
2023-09-04 17:06:09.695 | INFO | input_commands:check_dependencies:353 - MMseqs2 version found is v13.45111
2023-09-04 17:06:09.695 | INFO | input_commands:check_dependencies:362 - MMseqs2 version is ok.
2023-09-04 17:06:10.410 | INFO | input_commands:check_dependencies:386 - tRNAscan-SE version found is v2.0.12
2023-09-04 17:06:10.411 | INFO | input_commands:check_dependencies:397 - tRNAscan-SE version is ok.
2023-09-04 17:06:11.627 | INFO | input_commands:check_dependencies:421 - MinCED version found is v0.4.2
2023-09-04 17:06:11.628 | INFO | input_commands:check_dependencies:432 - MinCED version is ok.
2023-09-04 17:06:11.675 | INFO | input_commands:check_dependencies:458 - ARAGORN version found is v1.2.41
2023-09-04 17:06:11.676 | INFO | input_commands:check_dependencies:469 - ARAGORN version is ok.
2023-09-04 17:06:12.063 | INFO | input_commands:check_dependencies:485 - mash version found is v2.3
2023-09-04 17:06:12.064 | INFO | input_commands:check_dependencies:492 - mash version is ok.
2023-09-04 17:06:12.843 | INFO | input_commands:check_dependencies:511 - Dnaapler version found is v0.3.1
2023-09-04 17:06:12.843 | INFO | input_commands:check_dependencies:518 - Dnaapler version is ok.
2023-09-04 17:06:12.843 | INFO | input_commands:check_dependencies:529 - Pyrodigal version is v2.3.0
2023-09-04 17:06:12.843 | INFO | input_commands:check_dependencies:530 - Pyrodigal version is ok.
2023-09-04 17:06:12.844 | INFO | __main__:main:113 - You have specified --genbank.
2023-09-04 17:06:12.844 | INFO | __main__:main:114 - Therefore, /scratch/p304845/Annotation/Pharokka_annotation/viruses_present_output.csv is a genbank file instead of a FASTA file.
2023-09-04 17:06:12.844 | INFO | __main__:main:117 - Your custom CDS calls in this genbank file will be preserved.
2023-09-04 17:06:14.652 | INFO | __main__:main:280 - Extracting CDS information from your genbank file.
2023-09-04 17:06:17.836 | INFO | __main__:main:291 - Starting tRNA-scanSE.
2023-09-04 17:06:17.840 | INFO | external_tools:run:50 - Started running tRNAscan-SE --thread 24 -G -Q -j /scratch/p304845/Annotation/Pharokka_annotation/Results/trnascan_out.gff /scratch/p304845/Annotation/Pharokka_annotation/Results/genbank.fasta ...
2023-09-04 17:06:18.037 | ERROR | external_tools:run_tool:94 - Error calling tRNAscan-SE --thread 24 -G -Q -j /scratch/p304845/Annotation/Pharokka_annotation/Results/trnascan_out.gff /scratch/p304845/Annotation/Pharokka_annotation/Results/genbank.fasta (return code 2)
2023-09-04 17:06:18.037 | ERROR | processes:run_trna_scan:532 - Error: tRNAscan-SE not found
Thanks in advance!
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 19 (15 by maintainers)
Thank you guys @althonos @gbouras13 @apcamargo! That’s called efficiency! Just tested it on a set of ~2000 metagenomic viral contigs and no problems observed!
Easiest commit I have made, sorry about that @althonos but fixed now 😃
@althonos @apcamargo I’m releasing v1.5.0 today. I’ve added a citation to the citations section for
pyrodigal-gv
as this:Larradle M. and Camargo A., (2023) Pyrodigal-gv: A Pyrodigal extension to predict genes in giant viruses and viruses with alternative genetic code. https://github.com/althonos/pyrodigal-gv.
If you want to change it just let me know.
George
I’ve tested it out today on a number of single phages + a 673 contig gut phage test set (from Yutin et al), around 50MB fasta file.
The results look great, I get exactly what is expected and the output is easily parsable for me.
With
pyrodigal v3.0.0-alpha2
&pyrodigal v0.1.0a1
implemented in Pharokka v1.5.0 (dev branch):Running on Intel Core i7-10700K CPU @ 3.80GHz on a machine running Ubuntu 20.04.6 LTS, it took 305 seconds to run the
pyrodigal-gv
step (I didn’t do proper measurements withtime
, this is just coming from the Pharokka output log).This compares to 184 seconds for regular
pyrodigal
on the same dataset.For single phage contigs runtime is negligible for both (sub 1 second).
Overall, I am happy with the output for my purposes, when you release
pyrodigal v3
andpyrodigal_gv
as a stable release, I will soon after release Pharokka v1.5.0 with support forpyrodigal-gv
.Thanks @althonos and @apcamargo amazing stuff!
George
@gbouras13 :
pip install -U --pre pyrodigal-gv
if you wanna test 😉On that note, I think within-genome stop codon reassignment is probably out of scope for Prodigal, because the dynamic programming needs to have the codons from the entire genome to start finding the highest scoring gene path. However, alternative coding prediction could be feasible, similarly to the
meta
mode, trying to nodes with several different genetic codes before extracting genes for the highest scoring one.I don’t think I will integrate it to “stock” Pyrodigal, but i can probably figure out a way to get that into a different dedicated package that has the
prodigal-gv
models, especially if you’d find it useful right away 👍I’m still testing it out. But no problems so far!
Beautiful, outputs look great to me.
More than happy to do any testing that you would find useful Martin.
George
@gbouras13 : I have made a new Pyrodigal pre-release that supports that (
pip install -U --pre pyrodigal
to get3.0.0-alpha2
), this way you get a newinclude_translation_table
keyword argument among other things that outputs the translation table. (I’m usingtranslation_table
to remain consistent with Prodigal).@althonos I’ve given it a test run (with the pyrodigal v3 alpha installed from source), it looks phenomenal mate.
The only thing lacking I can think that would be useful (in terms of recapitulating the pyrodigal-gv output and generally) is ‘genetic_code’ in the description (it’s between rbs_spacer and gc_cont).
prodigal-gv v2.11 output
pyrodigal-gv output
the code:
George
I think it would be very useful quite soon for lots of users in the gut metagenome/giant virus etc space.
Also more generally speaking as sort of mentioned in https://github.com/althonos/pyrodigal/issues/24, not sure what you have in mind but there’s some scope to build upon your methods for alternative coding prediction and within-genome stop codon reassignment (see e.g. https://github.com/gatech-genemark/Mgcod ).
Happy to help out and contribute whatever I can to help make this happen (and I’m sure @asierFernandezP and @apcamargo would be keen too by the looks).
George