A multiple sequence alignment program, MAFFT, has been developed. The CPU time is
drastically reduced as compared with existing methods. MAFFT includes two novel techniques.
(i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT),
in which an amino acid sequence is converted to a sequence composed of volume and
polarity values of each amino acid residue. (ii) We propose a simplified scoring system
that performs well for reducing CPU time and increasing the accuracy of alignments
even for sequences having large insertions or extensions as well as distantly related
sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2)
and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances
of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations
and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with
CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE,
when the number of input sequences exceeds 60, without sacrificing the accuracy.