bcbio-nextgen: vcf2db "sqlalchemy.exc.InterfaceError (sqlite3.InterfaceError) Error binding parameter 72 - probably unsupported type." - caused by a specific variant record in the vcf file
Hello,
I was running bcbio v1.1.0 on a set of ~90 samples and I noticed 3 of those samples failed at the same step (vcf2db) with the same exact error. Upon further inspection I noticed that this error is caused by the same variant in all these 3 samples, and removing that variant from these vcf files seems to bypass the error (i.e. vcf2db finished successfully). I am attaching the error log, and a slice of the vcf file including the variant here (line 171 in the attached vcf file).
Command used:
vcf2db.py C3_4-gatk-haplotype-nomultiallelic-annotated-gemini.vcf.gz C3_4-gatk-haplotype-nomultiallelic.ped C3_4-gatk-haplotype.db
BCBIO LOG: bcbio-nextgen.log
CONFIG FILE: C3_4.yaml.txt
VCF FILE: (variant causing the error is on line 171) C3_4-gatk-haplotype-nomultiallelic-annotated-gemini.vcf.txt
PED FILE: C3_4-gatk-haplotype-nomultiallelic.ped.txt
Could you please help me with this error?
Thank you, Teja.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 43 (32 by maintainers)
Hello everyone!
Thanks for the discussion, it helped me to learn a lot more about the issue!
There is no issue with duplicated chr10 variant in it:
Ensembl’s gnomad_exome is v2.0.1, it is not sorted, not decomposed, and not normalized, and it has an issue with duplicated variants (bcftools stats for chr10):
Issue with duplicated variant:
Hopefully, Ensembl will soon sync their vcf with gnomad2.1 and we will not need processing the vcf in cloudbiolinux/bcbio installation anymore. In the meanwhile, we need that, at least for users who are running bcbio for the first time. They just need a smooth pipeline run, i.e. to call, annotate and prioritize variants, not going into details about little issues of gnomad 2.0.1 reprocessed by ensembl.
One of the duplicated variants has PASS in the filter tag and the other has not. That gave me an idea to prioritize PASS variants rather than first variants when selecting unique ones.
The processing: chr name conversion - sort - PASS - decompose - normalize - uniq works:
Duplicated variant in chr10 is gone:
Sergey