ProbCons

ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.[1][2]

Algorithm

The following describes the basic outline of the ProbCons algorithm.[3]

Step 1: Reliability of an alignment edge

For every pair of sequences compute the probability that letters and are paired in an alignment that is generated by the model.

(Where is equal to 1 if and are in the alignment and 0 otherwise.)

Step 2: Maximum expected accuracy

The accuracy of an alignment with respect to another alignment is defined as the number of common aligned pairs divided by the length of the shorter sequence.

Calculate expected accuracy of each sequence:

This yields a maximum expected accuracy (MEA) alignment:

Step 3: Probabilistic Consistency Transformation

All pairs of sequences x,y from the set of all sequences are now re-estimated using all intermediate sequences z:

This step can be iterated.

Step 4: Computation of guide tree

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.

Step 5: Compute MSA

Finally compute the MSA using progressive alignment or iterative alignment.

gollark: Infinite decimals are, however, permitted.
gollark: Equivalent too. This can be proven using maths.
gollark: Also also the same number.
gollark: floor(τ) is also unacceptable and also the same number.
gollark: floor(2π) is not acceptable.

See also

References

  1. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005). "PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment". Genome Research. 15 (2): 330–340. doi:10.1101/gr.2821705. PMC 546535. PMID 15687296.
  2. Roshan, Usman (2014-01-01). "Multiple Sequence Alignment Using Probcons and Probalign". In Russell, David J (ed.). Multiple Sequence Alignment Methods. Methods in Molecular Biology. 1079. Humana Press. pp. 147–153. doi:10.1007/978-1-62703-646-7_9. ISBN 9781627036450. PMID 24170400.
  3. Lecture "Bioinformatics II" at University of Freiburg
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.