funannotate: eggnog v2 output is not being parsed correctly

Am getting the following error when running annotate on my isolates. The same problem occurs even when providing pre-processed results using --antismash and --eggnog flags.

-------------------------------------------------------
[05:15 PM]: OS: Ubuntu 20.04, 12 cores, ~ 33 GB RAM. Python: 3.8.5
[05:15 PM]: Running 1.8.4
[05:15 PM]: Found existing output directory funannotate_output/isolate1. Warning, will re-use any intermediate files found.
[05:15 PM]: Parsing input files
[05:15 PM]: Existing tbl found: funannotate_output/isolate1/predict_results/isolate1.tbl
[05:15 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[05:15 PM]: Annotation consists of: 10,616 gene models
[05:15 PM]: 10,364 protein records loaded
[05:15 PM]: Existing Pfam-A results found: funannotate_output/isolate1/annotate_misc/annotations.pfam.txt
[05:15 PM]: 11,652 annotations added
[05:15 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[05:15 PM]: 725 valid gene/product annotations from 1,061 total
[05:15 PM]: Existing Eggnog-mapper results found: funannotate_output/isolate1/annotate_misc/eggnog.emapper.annotations
[05:15 PM]: Parsing EggNog Annotations
[05:15 PM]: 0 COG and EggNog annotations added
[05:15 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.65
[05:15 PM]: 725 gene name and product description annotations added
[05:15 PM]: Existing MEROPS results found: funannotate_output/isolate1/annotate_misc/annotations.merops.txt
[05:15 PM]: 361 annotations added
[05:15 PM]: Existing CAZYme results found: funannotate_output/isolate1/annotate_misc/annotations.dbCAN.txt
[05:15 PM]: 511 annotations added
[05:15 PM]: Existing BUSCO2 results found: funannotate_output/isolate1/annotate_misc/annotations.busco.txt
[05:15 PM]: 1,279 annotations added
[05:15 PM]: Skipping phobius predictions, try funannotate remote -m phobius
[05:15 PM]: Existing SignalP results found: funannotate_output/isolate1/annotate_misc/signalp.results.txt
[05:15 PM]: 1,060 secretome and 0 transmembane annotations added
[05:15 PM]: Parsing InterProScan5 XML file
[05:15 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[05:15 PM]: Found 0 clusters, 0 biosynthetic enyzmes, and 0 smCOGs predicted by antiSMASH
[05:15 PM]: Found 0 duplicated annotations, adding 44,471 valid annotations
[05:15 PM]: Converting to final Genbank format, good luck!
[05:16 PM]: Creating AGP file and corresponding contigs file
[05:16 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[05:16 PM]: CMD ERROR: diamond blastp --sensitive --query funannotate_output/isolate1/annotate_misc/antismash/smcluster.proteins.fasta --threads 12 --out funannotate_output/isolate1/annotate_misc/antismash/smcluster.MIBiG.blast.txt --db /home/fredrick/funannotate_db/mibig.dmnd --max-hsps 1 --evalue 0.001 --max-target-seqs 1 --outfmt 6
b'diamond v0.9.26.127 | by Benjamin Buchfink <buchfink@gmail.com>\nLicensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>\nCheck http://github.com/bbuchfink/diamond for updates.\n\n#CPU threads: 12\nScoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)\nTemporary directory: funannotate_output/isolate1/annotate_misc/antismash\nOpening the database...  [0.000575s]\n#Target sequences to report alignments for: 1\nOpening the input file...  [5.7e-05s]\nError: Error detecting input file format. First line seems to be blank.\n'```

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (22 by maintainers)

Commits related to this issue

Most upvoted comments

Perhaps it’s on pip?

Okay, well that’s not great for several reasons, but it actually got at least the v2 parser… problem is that hash is going to always evaluate as greater than… even if it was a hash from a version that is not there. But the problem here is the altered headers again. So I’ll try to fix it for this one… but I’ll open an issue – I do not have the time to update this every few weeks.

>>> vers = 've6ac7f2'
>>> vers < ('2.0.0')
False
>>> vers > ('2.0.0')
True

Okay, well this max_annot_lvl is definitely a new header. Really sucks this keeps changing… not sure I have the patience to constantly update this…

Hi there, I just run funannotate annotate (v1.8.7) with EggNOG mapper results generated externally (emapper.py v2.1.2) and funannotate fails to parse every single annotation. Apparently some changes have been made to the output files of emapper in the last release (https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.2#v212)

Thanks @nextgenusfs . Everything works okay now.

Here is the progress from a few of the isolates I was analysing

-------------------------------------------------------
[Mar 12 01:03 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 12 01:03 PM]: Running 1.8.4
[Mar 12 01:03 PM]: Found existing output directory funannotate_output/TR6417. Warning, will re-use any intermediate files found.
[Mar 12 01:03 PM]: Parsing input files
[Mar 12 01:03 PM]: Existing tbl found: funannotate_output/TR6417/predict_results/Ascochyta_rabiei_TR6417.tbl
[Mar 12 01:03 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 12 01:03 PM]: Annotation consists of: 9,766 gene models
[Mar 12 01:03 PM]: 9,633 protein records loaded
[Mar 12 01:03 PM]: Existing Pfam-A results found: funannotate_output/TR6417/annotate_misc/annotations.pfam.txt
[Mar 12 01:03 PM]: 10,843 annotations added
[Mar 12 01:03 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 12 01:03 PM]: 683 valid gene/product annotations from 984 total
[Mar 12 01:03 PM]: Existing Eggnog-mapper results found: funannotate_output/TR6417/annotate_misc/eggnog.emapper.annotations
[Mar 12 01:03 PM]: Parsing EggNog Annotations
[Mar 12 01:03 PM]: 18,165 COG and EggNog annotations added
[Mar 12 01:03 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 12 01:03 PM]: 2,451 gene name and product description annotations added
[Mar 12 01:03 PM]: Existing MEROPS results found: funannotate_output/TR6417/annotate_misc/annotations.merops.txt
[Mar 12 01:03 PM]: 331 annotations added
[Mar 12 01:03 PM]: Existing CAZYme results found: funannotate_output/TR6417/annotate_misc/annotations.dbCAN.txt
[Mar 12 01:03 PM]: 465 annotations added
[Mar 12 01:03 PM]: Existing BUSCO2 results found: funannotate_output/TR6417/annotate_misc/annotations.busco.txt
[Mar 12 01:03 PM]: 1,231 annotations added
[Mar 12 01:03 PM]: Existing Phobius results found: funannotate_output/TR6417/annotate_misc/phobius.results.txt
[Mar 12 01:03 PM]: Existing SignalP results found: funannotate_output/TR6417/annotate_misc/signalp.results.txt
[Mar 12 01:03 PM]: 973 secretome and 2,129 transmembane annotations added
[Mar 12 01:03 PM]: Parsing InterProScan5 XML file
[Mar 12 01:03 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 12 01:03 PM]: Found 25 clusters, 60 biosynthetic enyzmes, and 73 smCOGs predicted by antiSMASH
[Mar 12 01:03 PM]: Found 0 duplicated annotations, adding 65,558 valid annotations
[Mar 12 01:03 PM]: Converting to final Genbank format, good luck!
[Mar 12 01:04 PM]: Creating AGP file and corresponding contigs file
[Mar 12 01:04 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 12 01:04 PM]: Creating tab-delimited SM cluster output
[Mar 12 01:04 PM]: Writing genome annotation table.
[Mar 12 01:05 PM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_output/TR6417/annotate_results/Gene2Products.must-fix.txt
           2 gene/product names need to be curated, see funannotate_output/TR6417/annotate_results/Gene2Products.need-curating.txt
           100 gene/product names passed but are not in Database, see funannotate_output/TR6417/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

-------------------------------------------------------
-------------------------------------------------------
[Mar 12 01:05 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 12 01:05 PM]: Running 1.8.4
[Mar 12 01:05 PM]: Found existing output directory funannotate_output/TR9544. Warning, will re-use any intermediate files found.
[Mar 12 01:05 PM]: Parsing input files
[Mar 12 01:05 PM]: Existing tbl found: funannotate_output/TR9544/predict_results/Ascochyta_rabiei_TR9544.tbl
[Mar 12 01:05 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 12 01:05 PM]: Annotation consists of: 10,408 gene models
[Mar 12 01:05 PM]: 10,262 protein records loaded
[Mar 12 01:05 PM]: Existing Pfam-A results found: funannotate_output/TR9544/annotate_misc/annotations.pfam.txt
[Mar 12 01:05 PM]: 11,616 annotations added
[Mar 12 01:05 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 12 01:05 PM]: 718 valid gene/product annotations from 1,047 total
[Mar 12 01:05 PM]: Existing Eggnog-mapper results found: funannotate_output/TR9544/annotate_misc/eggnog.emapper.annotations
[Mar 12 01:05 PM]: Parsing EggNog Annotations
[Mar 12 01:05 PM]: 19,382 COG and EggNog annotations added
[Mar 12 01:05 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 12 01:05 PM]: 2,589 gene name and product description annotations added
[Mar 12 01:05 PM]: Existing MEROPS results found: funannotate_output/TR9544/annotate_misc/annotations.merops.txt
[Mar 12 01:05 PM]: 344 annotations added
[Mar 12 01:05 PM]: Existing CAZYme results found: funannotate_output/TR9544/annotate_misc/annotations.dbCAN.txt
[Mar 12 01:05 PM]: 496 annotations added
[Mar 12 01:05 PM]: Existing BUSCO2 results found: funannotate_output/TR9544/annotate_misc/annotations.busco.txt
[Mar 12 01:05 PM]: 1,265 annotations added
[Mar 12 01:05 PM]: Existing Phobius results found: funannotate_output/TR9544/annotate_misc/phobius.results.txt
[Mar 12 01:05 PM]: Existing SignalP results found: funannotate_output/TR9544/annotate_misc/signalp.results.txt
[Mar 12 01:05 PM]: 1,065 secretome and 2,300 transmembane annotations added
[Mar 12 01:05 PM]: Parsing InterProScan5 XML file
[Mar 12 01:05 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 12 01:05 PM]: Found 30 clusters, 73 biosynthetic enyzmes, and 83 smCOGs predicted by antiSMASH
[Mar 12 01:05 PM]: Found 0 duplicated annotations, adding 69,929 valid annotations
[Mar 12 01:05 PM]: Converting to final Genbank format, good luck!
[Mar 12 01:06 PM]: Creating AGP file and corresponding contigs file
[Mar 12 01:06 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 12 01:06 PM]: Creating tab-delimited SM cluster output
[Mar 12 01:06 PM]: Writing genome annotation table.
[Mar 12 01:06 PM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_output/TR9544/annotate_results/Gene2Products.must-fix.txt
           3 gene/product names need to be curated, see funannotate_output/TR9544/annotate_results/Gene2Products.need-curating.txt
           97 gene/product names passed but are not in Database, see funannotate_output/TR9544/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

-------------------------------------------------------
-------------------------------------------------------
[Mar 12 01:06 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 12 01:06 PM]: Running 1.8.4
[Mar 12 01:06 PM]: Found existing output directory funannotate_output/TR9571. Warning, will re-use any intermediate files found.
[Mar 12 01:06 PM]: Parsing input files
[Mar 12 01:06 PM]: Existing tbl found: funannotate_output/TR9571/predict_results/Ascochyta_rabiei_TR9571.tbl
[Mar 12 01:07 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 12 01:07 PM]: Annotation consists of: 11,226 gene models
[Mar 12 01:07 PM]: 10,946 protein records loaded
[Mar 12 01:07 PM]: Existing Pfam-A results found: funannotate_output/TR9571/annotate_misc/annotations.pfam.txt
[Mar 12 01:07 PM]: 12,652 annotations added
[Mar 12 01:07 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 12 01:07 PM]: 774 valid gene/product annotations from 1,265 total
[Mar 12 01:07 PM]: Existing Eggnog-mapper results found: funannotate_output/TR9571/annotate_misc/eggnog.emapper.annotations
[Mar 12 01:07 PM]: Parsing EggNog Annotations
[Mar 12 01:07 PM]: 21,724 COG and EggNog annotations added
[Mar 12 01:07 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 12 01:07 PM]: 2,778 gene name and product description annotations added
[Mar 12 01:07 PM]: Existing MEROPS results found: funannotate_output/TR9571/annotate_misc/annotations.merops.txt
[Mar 12 01:07 PM]: 386 annotations added
[Mar 12 01:07 PM]: Existing CAZYme results found: funannotate_output/TR9571/annotate_misc/annotations.dbCAN.txt
[Mar 12 01:07 PM]: 516 annotations added
[Mar 12 01:07 PM]: Existing BUSCO2 results found: funannotate_output/TR9571/annotate_misc/annotations.busco.txt
[Mar 12 01:07 PM]: 1,336 annotations added
[Mar 12 01:07 PM]: Existing Phobius results found: funannotate_output/TR9571/annotate_misc/phobius.results.txt
[Mar 12 01:07 PM]: Existing SignalP results found: funannotate_output/TR9571/annotate_misc/signalp.results.txt
[Mar 12 01:07 PM]: 1,087 secretome and 2,377 transmembane annotations added
[Mar 12 01:07 PM]: Parsing InterProScan5 XML file
[Mar 12 01:07 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 12 01:07 PM]: Found 27 clusters, 67 biosynthetic enyzmes, and 83 smCOGs predicted by antiSMASH
[Mar 12 01:07 PM]: Found 11,678 duplicated annotations, adding 76,672 valid annotations
[Mar 12 01:07 PM]: Converting to final Genbank format, good luck!
[Mar 12 01:08 PM]: Creating AGP file and corresponding contigs file
[Mar 12 01:08 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 12 01:08 PM]: Creating tab-delimited SM cluster output
[Mar 12 01:08 PM]: Writing genome annotation table.
[Mar 12 01:09 PM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_output/TR9571/annotate_results/Gene2Products.must-fix.txt
           3 gene/product names need to be curated, see funannotate_output/TR9571/annotate_results/Gene2Products.need-curating.txt
           103 gene/product names passed but are not in Database, see funannotate_output/TR9571/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

With this update funannotate compare also runs well without any hiccups.