T-Coffee

T-Coffee
Developer(s)	Cédric Notredame, Centro de Regulacio Genomica (CRG) - Barcelona
Stable release	11.00.8cbe486 / 13 August 2014
Preview release	11.00.d27cadf / 11 June 2015
Repository	github.com/cbcrg/tcoffee;
Operating system	UNIX, Linux, MS-Windows, Mac OS X
Type	Bioinformatics tool
Licence	GPL
Website	www.tcoffee.org

T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach.[1] It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported (FASTA, PIR).

Comparisons with other alignment software

While the default output is a Clustal-like format, it is sufficiently different from the output of ClustalW/X that many programs supporting Clustal format cannot read it; fortunately ClustalX can import T-Coffee output so the simplest fix for this issue is usually to import T-Coffee's output into ClustalX and then re-export. Another possibility is to request the strict Clustalw output format with the option "-output=clustalw_aln".

An important specificity of T-Coffee is its ability to combine different methods and different data types. In its latest version, T-Coffee can be used to combine protein sequences and structures, RNA sequences and structures. It can also run and combine the output of the most common sequence and structure alignment packages. For a complete list see: tclinkdb.txt

T-Coffee comes along with a sophisticated sequence reformatting utility named seq_reformat. An extensive documentation is available from t_coffee_technical.htm along with a tutorial t_coffee_tutorial.htm

Variations

M-Coffee: a special mode of T-Coffee that makes it possible to combine the output of the most common multiple sequence alignment packages (Muscle, ClustalW, Mafft, ProbCons, etc.). The resulting alignments are slightly better than the individual one, but most importantly the program indicates the alignment regions where the various packages agree upon. Regions of high agreement are usually well aligned.

Expresso and 3D-Coffee: these are special modes of T-Coffee making it possible to combine sequence and structures in an alignment. The structure based alignments can be carried out using the most common structural aligners such as TMalign, Mustang, and sap.

R-Coffee: a special mode of T-Coffee making it possible to align RNA sequences while using secondary structure information.

PSI-Coffee: aligns distantly related proteins using homology extension (slow and accurate)[2][3]

TM-Coffee: aligns transmembrane proteins using homology extension[4]

Pro-Coffee: aligns homologous promoter regions[5]

Accurate: automatically combine the most accurate modes for DNA, RNA and proteins (experimental!)

Combine: combines two (or more) multiple sequence alignments into a single one.[1][2]

Evaluation

TCS: (Transitive Consistency Score) an extended version of the T-Coffee scoring scheme.[6] It uses T-Coffee libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. TCS has been shown to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees against Heads-or-Tails, GUIDANCE, Gblocks, and trimAl.[7]

gollark: … it *is* apparently already permitted?

gollark: If you look up how to do it for ngircd I can almost certainly convince our cöadminoforms.

gollark: Sadly, the flash in most consumer SSDs is only good to a few thousand.

gollark: Probably, yes.

gollark: We do have some products for maintaining high vacuum environments.

References

Notredame C, Higgins DG, Heringa J (2000-09-08). "T-Coffee: A novel method for fast and accurate multiple sequence alignment". J Mol Biol. 302 (1): 205–217. doi:10.1006/jmbi.2000.4042. PMID 10964570.CS1 maint: multiple names: authors list (link)
Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C (Jul 2011). "T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension". Nucleic Acids Res. 39 (Web Server issue): W13–7. doi:10.1093/nar/gkr245. PMC 3125728. PMID 21558174.
Kemena C, Notredame C (2009-10-01). "Upcoming challenges for multiple sequence alignment methods in the high-throughput era". Bioinformatics. 25 (19): 2455–65. doi:10.1093/bioinformatics/btp452. PMC 2752613. PMID 19648142.
Chang JM, Di Tommaso P, Taly JF, Notredame C (2012-03-28). "Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee". BMC Bioinformatics. 13: S1. doi:10.1186/1471-2105-13-S4-S1. PMC 3303701. PMID 22536955.
Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C (Apr 2012). "Use of ChIP-Seq data for the design of a multiple promoter-alignment method". Nucleic Acids Res. 40 (7): e52. doi:10.1093/nar/gkr1292. PMC 3326335. PMID 22230796.
Chang, JM; Di Tommaso, P; Lefort, V; Gascuel, O; Notredame, C (1 July 2015). "TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction". Nucleic Acids Research. 43 (W1): W3-6. doi:10.1093/nar/gkv310. PMC 4489230. PMID 25855806.
Chang, JM; Di Tommaso, P; Notredame, C (Jun 2014). "TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction". Molecular Biology and Evolution. 31 (6): 1625–37. doi:10.1093/molbev/msu117. PMID 24694831.

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Notredame2000-1] Notredame C, Higgins DG, Heringa J (2000-09-08). "T-Coffee: A novel method for fast and accurate multiple sequence alignment". J Mol Biol. 302 (1): 205–217. doi:10.1006/jmbi.2000.4042. PMID 10964570.CS1 maint: multiple names: authors list (link)

[DiTommaso2011-2] Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C (Jul 2011). "T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension". Nucleic Acids Res. 39 (Web Server issue): W13–7. doi:10.1093/nar/gkr245. PMC 3125728. PMID 21558174.

[3] Kemena C, Notredame C (2009-10-01). "Upcoming challenges for multiple sequence alignment methods in the high-throughput era". Bioinformatics. 25 (19): 2455–65. doi:10.1093/bioinformatics/btp452. PMC 2752613. PMID 19648142.

[4] Chang JM, Di Tommaso P, Taly JF, Notredame C (2012-03-28). "Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee". BMC Bioinformatics. 13: S1. doi:10.1186/1471-2105-13-S4-S1. PMC 3303701. PMID 22536955.

[5] Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C (Apr 2012). "Use of ChIP-Seq data for the design of a multiple promoter-alignment method". Nucleic Acids Res. 40 (7): e52. doi:10.1093/nar/gkr1292. PMC 3326335. PMID 22230796.

[TCS_2015_NAR-6] Chang, JM; Di Tommaso, P; Lefort, V; Gascuel, O; Notredame, C (1 July 2015). "TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction". Nucleic Acids Research. 43 (W1): W3-6. doi:10.1093/nar/gkv310. PMC 4489230. PMID 25855806.

[7] Chang, JM; Di Tommaso, P; Notredame, C (Jun 2014). "TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction". Molecular Biology and Evolution. 31 (6): 1625–37. doi:10.1093/molbev/msu117. PMID 24694831.