next up previous
Next: Common Multiple Alignments Methods Up: Multiple Alignment to a Previous: Multiple Alignment to a

   
Lifted Alignment tree - a Heuristic for Phylogenetic Alignment

Definition 5.13   A phylogenetic alignment is called lifted alignment if for every internal node v, the string assigned to v is also assigned to one of v's children (See figure 5.2). Trivially, in such a case, all internal nodes are assigned labels from the set of leaf strings.

Let T* be the optimal alignment for tree T. We will construct a lifted alignment TL = Lift(T*), which is based on T*, with only a limited damage to the alignment distance. Note that this construction is only conceptual since we usually do not know T*. For each node v let its T* label be S*v. We shall assign every v a label SLv. Initially, only the leaves are labeled, and by definition SLv = S*v for each leaf v. The labeling process successively traverses the internal nodes in any order, provided that a node is not visited before any of its children. Upon visiting a node, it is lifted i.e. labeled by one of the labels of its children. Thus, the resulting phylogenetic alignment is lifted. (see figure 5.3)


                            Procedure Lift(T : Tree)

begin
while there exists an unlifted node v, all of whose children have been lifted, do :
Find a child j whose label Sj is the closest to S*v. Namely
For every child i of v:
$D(S^{*}_{v}, S_j) \leq D(S^{*}_{v}, S_i)$
Label S*v with Sj
end while
end




  
Figure 5.3: The lifting construction at node v. The numbers on the edges are the distances from S*v to the lifted strings labeling its children. On the left is the tree before lifting, and on the right the result of the lift. After the lift one edge will have a distance of 0.

\fbox{\epsfig{figure=lec05_figs/lec05_liftedtransform.eps}}





Theorem 5.7 (Jiang, Wang and Lawler, 1996 [5])   The distance of the phylogenetic alignment TL = Lift(T*) is at most twice the distance of the optimal phylogenetic alignment T*.

Proof:Let e = (v,w) be an edge in T. Suppose that in TL, Sj is the label of v and Si is the label of w. If i=j then D(Sj, Si) = 0. Otherwise:

 \begin{displaymath}
D(S_i, S_j) \leq D(S_j,S^{*}_{v}) +
D(S^{*}_{v}, S_i) \leq 2\cdot D(S^{*}_{v}, S_i)
\end{displaymath} (5.8)

The first inequality is due to the triangle inequality, and the second follows from the labeling algorithm. For an edge e = (v,w), with Sw = Si ,Let Pe be the path in T from v to the leaf labeled Si. Due to the triangle inequality

 \begin{displaymath}
D(S^{*}_{v}, S_i) \leq \mbox{the
total length of $P_e$\space in } T^*
\end{displaymath} (5.9)

We say that the edge e = (v,w) is blue in TL if $S_i
\neq S_j$. The distance of a lifted alignment TL is equal to the sum of edge distances on all the blue edges in the tree.
For a blue edge e = (v,w), observe that the definition of lifted alignment implies that along the path Pe every node except v is labeled Si, and no node outside Pe is labeled Si. Hence, if e' = (v', w') is any other blue edge, then Pe and Pe' have no edges in common. This defines a mapping from every blue edge e in TL to a path Pe in T* such that: Therefore the total distance of TL = the total distance on blue edges $\leq$ $2 \cdot$ the sum of all total distances of Pe paths in T* $\leq$ $2 \cdot$ the total distance of T*. (see figure 5.4)
  
Figure 5.4: The lifted tree T*L. The dashed edges show the paths along which a leaf string has been lifted to some internal node. Solid edges are blue edges in T*L, while each of these dashed edges has distance 0. The path P(a,b) for example, is the path b,d, S4 along which the string labeling b was lifted. Edge (a,b) has distance in T*L at most twice the distance of path P(a,b) in T*.

\fbox{\epsfig{figure=lec05_figs/lec05_liftedpaths.eps}}





We now describe how to find the optimal lifted alignment using a dynamic programming algorithm as listed below. But first we define:

Definition 5.14   Let Tv be the subtree of T rooted at node v and $S \in \S$. Let d(v,S) denote the distance of the best lifted alignment of Tv under the requirement that string S is assigned to node v.

The algorithm will compute d(v,S) for any $S \in \S$ working it's way from the leaves up using the following recursion: Time analysis: We perform a preprocessing stage, in which we compute all the $k \choose 2$ pairwise distances between the k input strings. This takes O(N2) time, where N is the total length of all the strings. The work at any internal node is O(k2), and the overall work of the algorithm is O(N2 + k3)
next up previous
Next: Common Multiple Alignments Methods Up: Multiple Alignment to a Previous: Multiple Alignment to a
Itshack Pe`er
1999-03-16