next up previous
Next: Weighted Parsimony Up: Parsimony Previous: Parsimony

     
Small Parsimony

Problem 9.3   Small Parsimony.
INPUT: The topology of a rooted phylogenetic tree with labeled leaves.
QUESTION:
1.
What is the minimum number of changes for this topology?
2.
What is the optimal labeling of the internal nodes?

This problem is relatively easy to solve. First of all, it is clear that we can solve for each character separately, characters being mutually independent. For a single character, we will present the following algorithm:

Fitch's algorithm [5]:

Input: A phylogenetic tree T, with n nodes, and a single character c with a set A of k possible values. Denote the value of the character for node v by vc.

Step 1: We will assign to each node v a set $S_v \subseteq A$, in the following fashion:

\begin{displaymath}\begin{array}{ll}
\mbox{For each leaf } v: & S_v = \{v_c\}.\...
... S_w & \ & \rm {otherwise}
\end{array}
\right.
\end{array}
\end{displaymath}

To compute Sv we will of course have to traverse the tree in postorder - starting with the leaves and working our way down to the root (this is actually a dynamic programming algorithm).

Step 2: Given the sets Sv, we will now determine the value vc to assign to the character c in each internal node v. This time, we traverse the tree in preorder, i.e., from the root up. For each internal node v, if v has a parent u satisfying $u_c \in S_v$, set $v_c \leftarrow u_c$; Otherwise, (including for the root node), arbitrarily assign any $t \in S_v$ to vc. The result of this algorithm is a fully-labeled tree. The number of changes in this tree is equal to the number of times $S_u \cap S_w$ was empty, in step 1.

Complexity: For each node v we work O(k) time to compute Sv, and again O(k) to compute vc. Total - $O(n \cdot k)$ time (step 2 can be performed in only O(n) total time in the average case). The above algorithm works with a single character. To obtain the optimal score and labeling for the entire data, simply run the algorithm once for each character. This leads to a total complexity of $O(m \cdot n \cdot k)$.

Example 9.4   In figure 9.4 we have the result of performing step 1 of Fitch's algorithm on a 5-species phylogeny showing a single character. The asterisks mark the nodes where $S_u \cap S_w$ was empty, which means that the minimum total cost of the tree is 3.


  
Figure 9.4: An example of step 1 of Fitch's algorithm for a 5-species phylogeny. Nodes marked by an asterisk (*) require a change along one of the edges to their children, adding 1 to the parsimony score.

\fbox{\epsfig{figure=lec09_figs/fitch.ps}}





It is not very clear at first sight why this algorithm works. We will next present a generalization of the Fitch algorithm, that is perhaps easier to understand.

 
next up previous
Next: Weighted Parsimony Up: Parsimony Previous: Parsimony
Itshack Pe`er
1999-02-18