In recent years improvements to existing programs and the introduction of new iterative
algorithms have changed the state-of-the-art in protein sequence alignment. This paper
presents the first systematic study of the most commonly used alignment programs using
BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10-20%
residue identity, the best programs were capable of correctly aligning on average
47% of the residues. We show that iterative algorithms often offer improved alignment
accuracy though at the expense of computation time. A notable exception was the effect
of introducing a single divergent sequence into a set of closely related sequences,
causing the iteration to diverge away from the best alignment. Global alignment programs
generally performed better than local methods, except in the presence of large N/C-terminal
extensions and internal insertions. In these cases, a local algorithm was more successful
in identifying the most conserved motifs. This study enables us to propose appropriate
alignment strategies, depending on the nature of a particular set of sequences. The
employment of more than one program based on different alignment techniques should
significantly improve the quality of automatic protein sequence alignment methods.
The results also indicate guidelines for improvement of alignment algorithms.