| Abstract |
|
|
Multiple alignment of DNA sequences is an important step in various molecular biological analyses. As a large amount of sequence
data is becoming available through genome and other large-scale sequencing projects, scalability, as well as accuracy, is
currently required for a multiple sequence alignment (MSA) program. In this chapter, we outline the algorithms of an MSA program
MAFFT and provide practical advice, focusing on several typical situations a biologist sometimes faces. For genome alignment,
which is beyond the scope of MAFFT, we introduce two tools: TBA and MAUVE.
Affiliation(s): (2) Digital Medicine Initiative, Kyushu University, 812-8582 Fukuoka, Japan
(3) Department of Computer Science, Stanford University, Stanford, CA, USA
(4) Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
(3) Department of Computer Science, Stanford University, Stanford, CA, USA
(4) Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Book Title: Bioinformatics for DNA Sequence Analysis
Series: Methods in Molecular Biology | Volume: 537 | Pub. Date: Jan-01-2009 | Page Range: 39-64 | DOI: 10.1007/978-1-59745-251-9_3
Subject: Bioinformatics
Key Words: Multiple sequence alignment - progressive method - iterative refinement method - consistency objective function - genome comparison
| References |
|
| 1. | Woese, C. R., and Fox, G. E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA
74, 5088–90. |
| 2. | Flicek, P., Keibler, E., Hu, P., Korf, I., and Brent, M. R. (2003) Leveraging the mouse genome for gene prediction in human:
from whole-genome shotgun reads to a global synteny map. Genome Res
13, 46–54. |
| 3. | Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on
fast Fourier transform. Nucleic Acids Res
30, 3059–66. |
| 4. | Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment.
Nucleic Acids Res
33, 511–8. |
| 5. | Wilm, A., Mainz, I., and Steger, G. (2006) An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol
1, 19. |
| 6. | Carroll, H., Beckstead, W., O’connor, T., Ebbert, M., Clement, M., Snell, Q., and McClellan, D. (2007) DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics 23, 2648–49. |
| 7. | Blanchette, M., Kent, W. J., Riemer, C., Elnitski, L., Smit, A. F., Roskin, K. M., Baertsch, R., Rosenbloom, K., Clawson,
H., Green, E. D., Haussler, D., and Miller, W. (2004) Aligning multiple genomic sequences with the threaded blockset aligner.
Genome Res
14, 708–15. |
| 8. | http://www.bx.psu.edu/miller_lab |
| 9. | Darling, A. C., Mau, B., Blattner, F. R., and Perna, N. T. (2004) Mauve: multiple alignment of conserved genomic sequence
with rearrangements. Genome Res
14, 1394–403. |
| 10. | http://gel.ahabs.wisc.edu/mauve/ |
| 11. | Edgar, R. C., and Batzoglou, S. (2006) Multiple sequence alignment. Curr Opin Struct Biol
16, 368–73. |
| 12. | Needleman, S. B., and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence
of two proteins. J Mol Biol
48, 443–53. |
| 13. | Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol
147, 195–7. |
| 14. | Gotoh, O. (1982) An improved algorithm for matching biological sequences. J Mol Biol
162, 705–8. |
| 15. | Feng, D. F., and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol
25, 351–60. |
| 16. | Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res
22, 4673–80. |
| 17. | Katoh, K., and Toh, H. (2007) Parttree: an algorithm to build an approximate tree from a large number of unaligned sequences.
Bioinformatics
23, 372–4. |
| 18. | Barton, G. J., and Sternberg, M. J. (1987) A strategy for the rapid multiple alignment of protein sequences. confidence levels
from tertiary structure comparisons. J Mol Biol
198, 327–37. |
| 19. | Berger, M. P., and Munson, P. J. (1991) A novel randomized iterative strategy for aligning multiple protein sequences. Comput Appl Biosci
7, 479–84. |
| 20. | Gotoh, O. (1993) Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci
9, 361–70. |
| 21. | Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A., and Kanehisa, M. (1993) Multiple sequence alignment by parallel
simulated annealing. Comput Appl Biosci
9, 267–73. |
| 22. | Notredame, C., and Higgins, D. G. (1996) Saga: sequence alignment by genetic algorithm. Nucleic Acids Res
24, 1515–24. |
| 23. | Gotoh, O. (1994) Further improvement in methods of group-to-group sequence alignment with generalized profile operations.
Comput Appl Biosci
10, 379–87. |
| 24. | Gotoh, O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci
11, 543–51. |
| 25. | Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed
by reference to structural alignments. J Mol Biol
264, 823–38. |
| 26. | Hirosawa, M., Totoki, Y., Hoshida, M., and Ishikawa, M. (1995) Comprehensive study on iterative algorithms of multiple sequence
alignment. Comput Appl Biosci
11, 13–18. |
| 27. | Vingron, M., and Argos, P. (1989) A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci
5, 115–21. |
| 28. | Gotoh, O. (1990) Consistency of optimal sequence alignments. Bull Math Biol
52, 509–25. |
| 29. | Notredame, C., Holm, L., and Higgins, D. G. (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics
14, 407–22. |
| 30. | Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment.
J Mol Biol
302, 205–17. |
| 31. | Higgins, D. G., and Sharp, P. M. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer.
Gene
73, 237–44. |
| 32. | Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences.
Comput Appl Biosci
8, 275–82. |
| 33. | Altschul, S. F. (1998) Generalized affine gap costs for protein sequence alignment. Proteins
32, 88–96. |
| 34. | Myers, E. W., and Miller, W. (1988) Optimal alignments in linear space. Comput Appl Biosci
4, 11–17. |
| 35. | Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA, 84, 4355–58. |
| 36. | Schwartz, S., Kent, W. J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R. C., Haussler, D., and Miller, W. (2003) Human-mouse
alignments with BLASTZ. Genome Res
13, 103–7. |
| 37. | http://genome.ucsc.edu/FAQ/FAQformat |
| 38. | http://genome.ucsc.edu/ |
| 39. | Smit, A. F. A., Hubley, R., and Green, P. Repeatmasker. http://www.repeatmasker.org/ |
| 40. | Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res
27, 573–80. |
| 41. | http://globin.cse.psu.edu/dist/gmaj/ |
| 42. | http://www.bx.psu.edu/miller_lab/dist/tba_howto.pdf |
| 43. | http://gel.ahabs.wisc.edu/mauve/mauve-user-guide/ |
| 44. | Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and
PSI-BLAST: a new generation of protein database search pro grams. Nucleic Acids Res
25, 3389–402. |
| 45. | Morgenstern, B., Goel, S., Sczyrba, A., and Dress, A. (2003) Altavist: comparing alternative multiple sequence alignments.
Bioinformatics
19, 425–6. |
| 46. | Lassmann, T., and Sonnhammer, E. L. (2007) Automatic extraction of reliable regions from multiple sequence alignments. BMC Bioinformat
8 Suppl 5, S9. |
| 47. | Morgenstern, B., Dress, A., and Werner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment
comparison. Proc Natl Acad Sci USA
93, 12098–103. |
| 48. | Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res
32, 1792–7. |
| 49. | Do, C. B., Mahabhashyam, M. S., Brudno, M., and Batzoglou, S. (2005) ProbCons: probabilistic consistency-based multiple sequence
alignment. Genome Res
15, 330–40. |
| 50. | Lassmann, T., and Sonnhammer, E. L. (2005) Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformat
6, 298. |
| 51. | Wallace, I. M., O’Sullivan, O., Higgins, D. G., and Notredame, C. (2006) M-Coffee: combining multiple sequence alignment methods
with t-coffee. Nucleic Acids Res
34, 1692–9. |
| 52. | Golubchik, T., Wise, M. J., Easteal, S., and Jermiin, L. S. (2007) Mind the gaps: Evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24, 2433–42. |
| 53. | Do, C. B., and Katoh, K. (2008) Protein multiple sequence alignment Functional Proteomics, Methods Mol Biol 484, 379–413. |
| 54. | Morrison, D. (2006) Multiple sequence alignment for phylogenetic purposes. Aust Syst Bot
19, 479–539. |
| 55. | Roshan, U., and Livesay, D. R. (2006) Probalign: multiple sequence alignment using partition function posterior probabilities.
Bioinformatics
22, 2715–21. |
| 56. | Yamada, S., Gotoh, O., and Yamana, H. (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group
sequence alignment algorithm with piecewise linear gap cost. BMC Bioinformat
7, 524. |
| 57. | Brudno, M., Do, C. B., Cooper, G. M., Kim, M. F., Davydov, E., Green, E. D., Sidow, A., and Batzoglou, S. (2003) LAGAN and
multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res
13, 721–31. |
| 58. | Bray, N., and Pachter, L. (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res.
14, 693–9. |
Comments (Loading...) |
||
Loading... |






















