vamb: ValueError: Length of TNFs and length of RPKM does not match. Verify the inputs
(vamb_env) -bash-4.1$ vamb --fasta mage_output/M-1507-133.A/intermediate/assembly_output/scaffolds.fasta --jgi coverage_output/coverage_metabat2.tsv --outdir vamb_output
Traceback (most recent call last):
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/vamb_env/bin/vamb", line 11, in <module>
sys.exit(main())
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/vamb_env/lib/python3.6/site-packages/vamb/__main__.py", line 528, in main
logfile=logfile)
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/vamb_env/lib/python3.6/site-packages/vamb/__main__.py", line 247, in run
len(tnfs), minalignscore, minid, subprocesses, logfile)
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/vamb_env/lib/python3.6/site-packages/vamb/__main__.py", line 121, in calc_rpkm
raise ValueError("Length of TNFs and length of RPKM does not match. Verify the inputs")
ValueError: Length of TNFs and length of RPKM does not match. Verify the inputs
Here’s what the output of jgi_summarize_bam_contig_depths
looks like:
(vamb_env) -bash-4.1$ head coverage_output/coverage_metabat2.tsv
contigName contigLen totalAvgDepth sorted.bam sorted.bam-var
NODE_1_length_581408_cov_10.671907 581408 17.0497 17.0497 24.4313
NODE_2_length_212490_cov_11.140151 212490 17.7493 17.7493 26.3056
NODE_3_length_56611_cov_10.039571 56611 16.0747 16.0747 24.7309
NODE_4_length_52215_cov_10.245380 52215 16.4059 16.4059 20.9325
NODE_5_length_49788_cov_11.464963 49788 18.376 18.376 28.3959
NODE_6_length_44487_cov_9.390124 44487 15.069 15.069 20.5564
NODE_7_length_41442_cov_10.399425 41442 16.6383 16.6384 22.4833
NODE_8_length_37801_cov_9.536534 37801 15.3226 15.3226 25.3435
NODE_9_length_28654_cov_10.767824 28654 17.234 17.234 22.4427
It’s the right number of rows too (n-1 for the headers)
(vamb_env) -bash-4.1$ grep -c "^>" mage_output/M-1507-133.A/intermediate/assembly_output/scaffolds.fasta
25728
(vamb_env) -bash-4.1$ wc -l coverage_output/coverage_metabat2.tsv
25729 coverage_output/coverage_metabat2.tsv
Here’s the version:
(mage_env) -bash-4.1$ conda list | grep "vamb"
vamb 3.0.2 py36hc5360cc_1 bioconda
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (3 by maintainers)
Dear @HaraldBrolin
The error comes because each contig needs both a TNF (which is obtained from the FASTA file), and an RPKM (which is obtained from the JGI input file). To fix the problem, you need to remove the sequences in the FASTA file for which you don’t have entries in the JGI depths file.
The JGI file does not seem to be correctly formatted, either. It should look like this.
Sort of. It’s not stored, but the final bin name is named e.g.
sample1_1
- depending on the names of your contigs - e.g, given the name of an output bin, you can always get the sample and the original bin.Thanks for using Vamb.
Multi-split is really dirt simple. After assembling individual samples, they are binned together. We then simply split each bin by sample - literally we just take all the contigs in bin 1 from sample 1 and put it in bin_1_1, contigs from bin 1 in sample 2 in bin_1_2, etc. So there is no reduction of redundancy, you get the same genomes duplicated if they are present in multiple samples.