pharokka: Error calling tRNAscan-SE

phrokka version: v1.4.1 Python version: Python 3.10.12 Operating System: Computing cluster

Description

I have run prodigal-gv on my viral genomes to get the ORFs (it also outputs a genbank formatted CSV file). Using the predicted ORFs as input I want to run Pharroka to annotate them (without running prodigal or Phannotate again).

What I Did

I run the following command:

/home2/p304845/Conda_envs/Pharokka_conda/bin/pharokka.py \
	-i Genbank_formatted_CSV \
	 --genbank \
	-o $output_dir \
	-d /home2/p304845/Conda_envs/Pharokka_conda/databases \
	-t ${SLURM_CPUS_PER_TASK}

And I got the following error:

2023-09-04 17:06:08.757 | INFO     | __main__:main:84 - Starting Pharokka v1.4.1
2023-09-04 17:06:08.759 | INFO     | __main__:main:85 - Command executed: Namespace(infile='/scratch/p304845/Annotation/Pharokka_annotation/viruses_present_output.csv', outdir='/scratch/p304845/Annotation/Pharokka_annotation/Results', database='/home2/p304845/Conda_envs/Pharokka_conda/databases', threads='24', force=False, prefix='Default', locustag='Default', gene_predictor='phanotate', meta=False, split=False, coding_table='11', evalue='1E-05', fast=False, mmseqs2_only=False, meta_hmm=False, dnaapler=False, custom_hmm='', genbank=True, terminase=False, terminase_strand='nothing', terminase_start='nothing', citation=False)
2023-09-04 17:06:08.759 | INFO     | __main__:main:86 - Repository homepage is https://github.com/gbouras13/pharokka
2023-09-04 17:06:08.759 | INFO     | __main__:main:87 - Written by George Bouras: george.bouras@adelaide.edu.au
2023-09-04 17:06:08.759 | INFO     | __main__:main:89 - Checking database installation in /home2/p304845/Conda_envs/Pharokka_conda/databases.
2023-09-04 17:06:08.767 | INFO     | __main__:main:92 - All databases have been successfully checked.
2023-09-04 17:06:08.767 | INFO     | __main__:main:108 - Checking dependencies.
2023-09-04 17:06:09.540 | INFO     | input_commands:check_dependencies:322 - Phanotate version found is v1.5.1
2023-09-04 17:06:09.541 | INFO     | input_commands:check_dependencies:331 - Phanotate version is ok.
2023-09-04 17:06:09.695 | INFO     | input_commands:check_dependencies:353 - MMseqs2 version found is v13.45111
2023-09-04 17:06:09.695 | INFO     | input_commands:check_dependencies:362 - MMseqs2 version is ok.
2023-09-04 17:06:10.410 | INFO     | input_commands:check_dependencies:386 - tRNAscan-SE version found is v2.0.12
2023-09-04 17:06:10.411 | INFO     | input_commands:check_dependencies:397 - tRNAscan-SE version is ok.
2023-09-04 17:06:11.627 | INFO     | input_commands:check_dependencies:421 - MinCED version found is v0.4.2
2023-09-04 17:06:11.628 | INFO     | input_commands:check_dependencies:432 - MinCED version is ok.
2023-09-04 17:06:11.675 | INFO     | input_commands:check_dependencies:458 - ARAGORN version found is v1.2.41
2023-09-04 17:06:11.676 | INFO     | input_commands:check_dependencies:469 - ARAGORN version is ok.
2023-09-04 17:06:12.063 | INFO     | input_commands:check_dependencies:485 - mash version found is v2.3
2023-09-04 17:06:12.064 | INFO     | input_commands:check_dependencies:492 - mash version is ok.
2023-09-04 17:06:12.843 | INFO     | input_commands:check_dependencies:511 - Dnaapler version found is v0.3.1
2023-09-04 17:06:12.843 | INFO     | input_commands:check_dependencies:518 - Dnaapler version is ok.
2023-09-04 17:06:12.843 | INFO     | input_commands:check_dependencies:529 - Pyrodigal version is v2.3.0
2023-09-04 17:06:12.843 | INFO     | input_commands:check_dependencies:530 - Pyrodigal version is ok.
2023-09-04 17:06:12.844 | INFO     | __main__:main:113 - You have specified --genbank.
2023-09-04 17:06:12.844 | INFO     | __main__:main:114 - Therefore, /scratch/p304845/Annotation/Pharokka_annotation/viruses_present_output.csv is a genbank file instead of a FASTA file.
2023-09-04 17:06:12.844 | INFO     | __main__:main:117 - Your custom CDS calls in this genbank file will be preserved.
2023-09-04 17:06:14.652 | INFO     | __main__:main:280 - Extracting CDS information from your genbank file.
2023-09-04 17:06:17.836 | INFO     | __main__:main:291 - Starting tRNA-scanSE.
2023-09-04 17:06:17.840 | INFO     | external_tools:run:50 - Started running tRNAscan-SE --thread 24 -G -Q -j /scratch/p304845/Annotation/Pharokka_annotation/Results/trnascan_out.gff /scratch/p304845/Annotation/Pharokka_annotation/Results/genbank.fasta ...
2023-09-04 17:06:18.037 | ERROR    | external_tools:run_tool:94 - Error calling tRNAscan-SE --thread 24 -G -Q -j /scratch/p304845/Annotation/Pharokka_annotation/Results/trnascan_out.gff /scratch/p304845/Annotation/Pharokka_annotation/Results/genbank.fasta (return code 2)
2023-09-04 17:06:18.037 | ERROR    | processes:run_trna_scan:532 - Error: tRNAscan-SE not found

Thanks in advance!

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 19 (15 by maintainers)

Most upvoted comments

Thank you guys @althonos @gbouras13 @apcamargo! That’s called efficiency! Just tested it on a set of ~2000 metagenomic viral contigs and no problems observed!

Easiest commit I have made, sorry about that @althonos but fixed now 😃

@althonos @apcamargo I’m releasing v1.5.0 today. I’ve added a citation to the citations section for pyrodigal-gv as this:

Larradle M. and Camargo A., (2023) Pyrodigal-gv: A Pyrodigal extension to predict genes in giant viruses and viruses with alternative genetic code. https://github.com/althonos/pyrodigal-gv.

If you want to change it just let me know.

George

I’ve tested it out today on a number of single phages + a 673 contig gut phage test set (from Yutin et al), around 50MB fasta file.

The results look great, I get exactly what is expected and the output is easily parsable for me.

With pyrodigal v3.0.0-alpha2 & pyrodigal v0.1.0a1 implemented in Pharokka v1.5.0 (dev branch):

Running on Intel Core i7-10700K CPU @ 3.80GHz on a machine running Ubuntu 20.04.6 LTS, it took 305 seconds to run the pyrodigal-gv step (I didn’t do proper measurements with time, this is just coming from the Pharokka output log).

This compares to 184 seconds for regular pyrodigal on the same dataset.

For single phage contigs runtime is negligible for both (sub 1 second).

Overall, I am happy with the output for my purposes, when you release pyrodigal v3 and pyrodigal_gv as a stable release, I will soon after release Pharokka v1.5.0 with support for pyrodigal-gv.

Thanks @althonos and @apcamargo amazing stuff!

George

@gbouras13 : pip install -U --pre pyrodigal-gv if you wanna test 😉

Also more generally speaking as sort of mentioned in https://github.com/althonos/pyrodigal/issues/24, not sure what you have in mind but there’s some scope to build upon your methods for alternative coding prediction and within-genome stop codon reassignment (see e.g. https://github.com/gatech-genemark/Mgcod ).

On that note, I think within-genome stop codon reassignment is probably out of scope for Prodigal, because the dynamic programming needs to have the codons from the entire genome to start finding the highest scoring gene path. However, alternative coding prediction could be feasible, similarly to the meta mode, trying to nodes with several different genetic codes before extracting genes for the highest scoring one.

Work with @althonos to get prodigal-gv’s models added to pyrodigal https://github.com/althonos/pyrodigal/issues/24. But this seems like a lot of work to do when prodigal-gv already exists, plus it feels like more of a fork-related feature I think than a core pyrodigal feature (it’s up to Martin anyway after all).

I don’t think I will integrate it to “stock” Pyrodigal, but i can probably figure out a way to get that into a different dedicated package that has the prodigal-gv models, especially if you’d find it useful right away 👍

I’m still testing it out. But no problems so far!

MZ130495.1	pyrodigal_v3.0.0-alpha2	CDS	98636	101128	474.7	+	0	ID=MZ130495.1_100;partial=01;start_type=ATG;rbs_motif=TAA;rbs_spacer=12bp;gc_cont=0.361;transl_table=15;conf=99.99;score=474.69;cscore=467.02;sscore=7.67;rscore=5.99;uscore=1.68;tscore=0.00;

Beautiful, outputs look great to me.

More than happy to do any testing that you would find useful Martin.

George

@gbouras13 : I have made a new Pyrodigal pre-release that supports that (pip install -U --pre pyrodigal to get 3.0.0-alpha2), this way you get a new include_translation_table keyword argument among other things that outputs the translation table. (I’m using translation_table to remain consistent with Prodigal).

@althonos I’ve given it a test run (with the pyrodigal v3 alpha installed from source), it looks phenomenal mate.

The only thing lacking I can think that would be useful (in terms of recapitulating the pyrodigal-gv output and generally) is ‘genetic_code’ in the description (it’s between rbs_spacer and gc_cont).

prodigal-gv v2.11 output


MZ130495.1      Prodigal_v2.11.0-gv     CDS     98636   101128  474.7   +       0       ID=1_100;partial=01;start_type=ATG;rbs_motif=TAA;rbs_spacer=12bp;genetic_code=15;gc_cont=0.361;conf=99.99;score=474.69;cscore=467.02;sscore=7.67;rscore=5.99;uscore=1.68;tscore=0.00;

pyrodigal-gv output


MZ130495.1	pyrodigal_v3.0.0-alpha1	CDS	98636	101128	474.7	+	0	ID=MZ130495.1_100;partial=01;start_type=ATG;rbs_motif=TAA;rbs_spacer=12bp;gc_cont=0.361;conf=99.99;score=474.69;cscore=467.02;sscore=7.67;rscore=5.99;uscore=1.68;tscore=0.00;

the code:

def run_pyrodigal_gv(filepath_in, out_dir, coding_table):

    orf_finder = pyrodigal_gv.ViralGeneFinder(meta=True)

    with open(os.path.join(out_dir, "prodigal_out.gff"), "w") as dst:
        for i, record in enumerate(SeqIO.parse(filepath_in, "fasta")):
            genes = orf_finder.find_genes(str(record.seq))
            genes.write_gff(dst, sequence_id=record.id)

George

I think it would be very useful quite soon for lots of users in the gut metagenome/giant virus etc space.

Also more generally speaking as sort of mentioned in https://github.com/althonos/pyrodigal/issues/24, not sure what you have in mind but there’s some scope to build upon your methods for alternative coding prediction and within-genome stop codon reassignment (see e.g. https://github.com/gatech-genemark/Mgcod ).

Happy to help out and contribute whatever I can to help make this happen (and I’m sure @asierFernandezP and @apcamargo would be keen too by the looks).

George