Lifted Alignment tree - a Heuristic for Phylogenetic Alignment

Next: Common Multiple Alignments Methods Up: Multiple Alignment to a Previous: Multiple Alignment to a

Lifted Alignment tree - a Heuristic for Phylogenetic Alignment

Definition 0.10 A phylogenetic alignment is called lifted alignment if for every internal node v, the string assigned to v is also assigned to one of v's children (See figure 4.4). Trivially, in such a case, all internal nodes are assigned labels from the set of leaf strings.

Let T^* be the optimal alignment for tree T. We will construct a lifted alignment T^L = Lift(T^*), which is based on T^*, with only a limited damage to the alignment distance. Note that this construction is only conceptual since we usually do not know T^*.

**Figure 4.4:** A phylogenetic tree with lifted alignment. Each internal node is labeled by one of the strings labeling its children.
65#65

For each node v let its T^* label be S^*_v. We shall assign every v a label S^L_v. Initially, only the leaves are labeled, and by definition S^L_v = S^*_v for each leaf v. The labeling process successively traverses the internal nodes in any order, provided that a node is not visited before any of its children. Upon visiting a node, it is lifted i.e. labeled by one of the labels of its children. Thus, the resulting phylogenetic alignment is lifted. (see figure 4.5)
66#66

**Figure 4.5:** The lifting construction at node v. The numbers on the edges are the distances from S^*_v to the lifted strings labeling its children. On the left is the tree before lifting, and on the right the result of the lift. After the lift one edge will have a distance of 0.
67#67

68#68

69#69

**Figure 4.6:** The lifted tree T^*_L. The dashed edges show the paths along which a leaf string has been lifted to some internal node, thus their distance is 0. Solid edges are blue edges in T^*_L. The path P_(a,b) for example, is the path b,d, S₄ along which the string labeling b was lifted. Edge (a,b) has distance in T^*_L at most twice the distance of path P_(a,b) in T^*.
70#70

We now describe how to find the optimal lifted alignment using a dynamic programming algorithm as listed below. But first we define:

Definition 0.11 Let T_v be the subtree of T rooted at node v and 71#71. Let d(v,S) denote the distance of the best lifted alignment of T_v under the requirement that string S is assigned to node v.

The algorithm will compute d(v,S) for any 71#71 working it's way from the leaves up using the following recursion:

If v is an internal node with all its children being leaves, then 72#72 where S_w is the label of w.
Else, 73#73 where v' is a child of v and S' is a label of one of the leaves of T_v'.

Time analysis: We perform a preprocessing stage, in which we compute all the 38#38 pairwise distances between the k input strings. This takes O(N²) time, where N is the total length of all the strings. The work at any internal node is O(k²), and the overall work of the algorithm is O(N² + k³)

Next: Common Multiple Alignments Methods Up: Multiple Alignment to a Previous: Multiple Alignment to a

Peer Itsik
2000-12-06