bifrost: Tutorial on k-mer color API, my current use results in corruption?

Hi,

Do you have any resources on how to use the k-mer/unitig color API in Bifrost? I have been playing around with it, and I think I understand it, but I’m encountering an issue where some unitigs have no colors associated with them anymore, or worse, the whole colorset is a nullptr.

For context: say I have a graph constructed from both a reference genome and WGS data from a different strain. I want to perform some graph cleaning, and identified a bunch of unitigs that have too low coverage in the sample and which I want to have removed, or at least not associated with the sample color anymore.

I’ve constructed the following example to do that: https://github.com/broadinstitute/pyfrost/blob/master/tests/test_node_removal.cpp

This example reads a file to_remove.txt which contains the head k-mer of a unitig to be removed from the sample on each line. First, I discard the sample color ID from that unitig, and if no colors remain, I queue it to be fully removed from the graph.

I save the cleaned graph to a file, and then read it again. Most nodes still have correct colors associated with them. For some nodes, however, the colorset will be a nullptr, resulting a crash when trying to do any operation, while for others the colorset is not a nullptr but doesn’t contain any colors (which shouldn’t happen because those unitigs should’ve been removed).

Am I using the API in an incorrect way? Is it a custom function I added to Bifrost in my fork that transforms any UnitigMapping to a mapping representing the whole unitig? A bug in Bifrost?

Any help would be much appreciated, thanks!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15

Most upvoted comments

Amazing work!! I’ve successfully ran all my scripts without errors. Thanks a lot!