pandora: Solve incompatible genotypes through a dynamic programming algorithm

Suppose we have two records R1_gt_0 and R2_gt_1. R1_gt_0 genotypes towards the reference (allele 0) and R2_gt_1 genotypes towards the first alternative allele (allele 1). For simplicity, assume only one sample. These records overlap, and thus they are conflicting and we have to solve this conflict (in pandora this is done by making the genotypes compatible).

Right now, in the code, it works like this:

  • If the likelihood of R1_gt_0 is higher than R2_gt_1, then R2_gt_1 genotype is changed from 1 to 0.
  • If the opposite happens, the likelihood of R2_gt_1 is higher than R1_gt_0, then R1_gt_0 genotype is changed from 0 to ..

I don’t understand why we have different behaviours depending on the highest likelihood allele (if it is ref or alt). If we have conflicting records, for me it makes more sense to change the genotype of the lowest likelihood one to . (we choose one, and all the others become .)… but it seems that the previous described behaviour is intended (i.e. not a bug). Could you shed some light on this?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 31 (3 by maintainers)

Most upvoted comments

christ

We might be missing sth here, we might need to think more about this problem… but I guess we are at least sure it is a hard problem, and the very large majority of cases is solved with what we have already implemented. So I guess we leave this as an enhancement post-paper?