Faster DP for Multiple Alignment

Next: Approximation Algorithms for Multiple Up: Multiple Alignment Previous: Scoring Metrics

Faster DP for Multiple Alignment

Carrillo and Lipman [1] found a heuristic method for accelerating the search for the best multiple alignment. The method is based on the property that if the strings are relatively similar, the alignment path would be close to the main diagonal, therefore not all the values in the multi-dimensional cube need to be calculated, we now detail this algorithm. Assuming an upper bound on cost of the best alignment, we will discard some alignments that are a priori known to be more expensive than the bound on the cost. Let A be an alignment of strings 6#6. Denote by A_i,j the pair of rows in A containing only x_i and x_j, and by c(A_i,j) the cost of this pairwise alignment. Denote by c(A) the total cost of A, and suppose we define 13#13. Let A^* be the optimal alignment (the one with the minimal cost), and suppose we know that 14#14. Therefore,

15#15

Where D(x,y) is the optimal score for aligning strings x and y. It follows that

16#16

A^*_u,v is a projection of A^* on the uv-plain. By calculating D(x_i,x_j) for each i and j, we can find 17#17. Now, consider a cell 18#18 whose projection to the uv-plane is (s,t). If the best alignment A^* passes through this cell, then its projection A^*_u,v passes through (s,t), and its cost c(A^*_u,v) agrees with 19#19 where best^(u,v)_s,t is an upper bound on the optimal score for an alignment through (s,t) in the uv-plain. We can compute such an upper bound as:

20#20

where 21#21 is the cost of matching the characters 22#22 and 23#23. Therefore if best^(u,v)_s,t > B(u,v), then the best alignment A^* cannot pass through the cell
18#18 for any 24#24, and these cells can be discarded from the computation.

Next: Approximation Algorithms for Multiple Up: Multiple Alignment Previous: Scoring Metrics

Peer Itsik
2000-12-06