next up previous
Next: Common Multiple Alignments Methods Up: Multiple Alignment to a Previous: Multiple Alignment to a

Lifted Alignment tree - a Heuristic for Phylogenetic Alignment

Definition 0.10   A phylogenetic alignment is called lifted alignment if for every internal node v, the string assigned to v is also assigned to one of v's children (See figure 4.4). Trivially, in such a case, all internal nodes are assigned labels from the set of leaf strings.

Let T* be the optimal alignment for tree T. We will construct a lifted alignment TL = Lift(T*), which is based on T*, with only a limited damage to the alignment distance. Note that this construction is only conceptual since we usually do not know T*.
  
Figure 4.4: A phylogenetic tree with lifted alignment. Each internal node is labeled by one of the strings labeling its children.
65#65

For each node v let its T* label be S*v. We shall assign every v a label SLv. Initially, only the leaves are labeled, and by definition SLv = S*v for each leaf v. The labeling process successively traverses the internal nodes in any order, provided that a node is not visited before any of its children. Upon visiting a node, it is lifted i.e. labeled by one of the labels of its children. Thus, the resulting phylogenetic alignment is lifted. (see figure 4.5)
66#66

  
Figure 4.5: The lifting construction at node v. The numbers on the edges are the distances from S*v to the lifted strings labeling its children. On the left is the tree before lifting, and on the right the result of the lift. After the lift one edge will have a distance of 0.
67#67


68#68

  69#69

  
Figure 4.6: The lifted tree T*L. The dashed edges show the paths along which a leaf string has been lifted to some internal node, thus their distance is 0. Solid edges are blue edges in T*L. The path P(a,b) for example, is the path b,d, S4 along which the string labeling b was lifted. Edge (a,b) has distance in T*L at most twice the distance of path P(a,b) in T*.
70#70

We now describe how to find the optimal lifted alignment using a dynamic programming algorithm as listed below. But first we define:

Definition 0.11   Let Tv be the subtree of T rooted at node v and 71#71. Let d(v,S) denote the distance of the best lifted alignment of Tv under the requirement that string S is assigned to node v.

The algorithm will compute d(v,S) for any 71#71 working it's way from the leaves up using the following recursion: Time analysis: We perform a preprocessing stage, in which we compute all the 38#38 pairwise distances between the k input strings. This takes O(N2) time, where N is the total length of all the strings. The work at any internal node is O(k2), and the overall work of the algorithm is O(N2 + k3)
next up previous
Next: Common Multiple Alignments Methods Up: Multiple Alignment to a Previous: Multiple Alignment to a
Peer Itsik
2000-12-06