spoa: Unexpected multiple sequence alignment

I am seeing an unexpected MSA and consensus sequence. By eye, I can see a more parsimonious result. See below for details. Any insight would be appreciated.

Actual Output
Consensus (142)
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
Multiple sequence alignment
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCATCCACCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGCGGGCGCTGTGGACAGCGCTCCTTACCACC------------------------------------
CCCGCCCCTGAAAGCCTTCGCGCCCGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGG------------------C-----------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCGGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGG------------------CCTG--------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAAC----AGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAAC----AGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
Expected Output
Consensus (142)
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGGC----TATCCCCAGCCCTTACCGGCGTGT
Multiple sequence alignment
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCATCCACCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGCGGGCGCTGTGGACAGCGCTCCTTACCACC--------------------------------
CCCGCCCCTGAAAGCCTTCGCGCCCGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGC-------------------------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCGGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTG----------------------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
Source: example.cpp
#include "spoa/spoa.hpp"

int main(int argc, char** argv) {

	std::vector<std::string> sequences = {
		"CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCATCCACCAGGCTGCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGCGGGCGCTGTGGACAGCGCTCCTTACCACC",
		"CCCGCCCCTGAAAGCCTTCGCGCCCGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGC",
		"CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCGGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTG",
		"CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT"
	};
	
	auto alignment_engine = spoa::createAlignmentEngine(static_cast<spoa::AlignmentType>(atoi(argv[1])), atoi(argv[2]), atoi(argv[3]), atoi(argv[4]));

	auto graph = spoa::createGraph();

	for (const auto& it: sequences) {
		auto alignment = alignment_engine->align_sequence_with_graph(it, graph);
		graph->add_alignment(alignment, it);
	}

	std::string consensus = graph->generate_consensus();

	fprintf(stderr, "Consensus (%zu)\n", consensus.size());
	fprintf(stderr, "%s\n", consensus.c_str());

	std::vector<std::string> msa;
	graph->generate_multiple_sequence_alignment(msa, true);

	fprintf(stderr, "Multiple sequence alignment\n");
	for (const auto& it: msa) {
		fprintf(stderr, "%s\n", it.c_str());
	}

	return 0;
}

Spoa commit : 783d7b6925375c8fe486c03d2f271bf65b7dc6f5

Compiled with g++ example.cpp -std=c++11 -Iinclude/ -Lbuild/lib/ -lspoa -o example

Run with ./example 0 5 -4 -8

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

@rvaser awesome!