Weighted Parsimony

Next: Large Parsimony Up: Parsimony Previous: Small Parsimony

Weighted Parsimony

In this version of the problem the price of a change is not constant. Instead, denote by C^c_ij the cost of the character c changing from state i to state j. The problem is still to minimize the total cost of the tree given the topology and the leaf labels.

$\begin{problem}Weighted Small Parsimony.\\ * {\bf INPUT:} \begin{itemize} \item ... ...What is the optimal labeling of the internal nodes? \end{enumerate}\end{problem}$
We will present an algorithm by Sankoff [15] which is a generalization² of the Fitch algorithm.

Sankoff's algorithm:

Step 1: We will compute, for each node v and each state t a quantity S_t(v) which is the minimum cost of the subtree whose root is v given v_c = t. The order of computation will be, as in step 1 of Fitch, postorder: For each leaf v:

$\begin{displaymath}S^c_t(v) = \left\{ \begin{array}{ll} 0 & v_c = t \\ \infty & \rm {otherwise} \end{array}\right. \end{displaymath}$

(2)

For an internal node v, with subnodes u and w, it is easy to see that:

$\begin{displaymath}S^c_t(v) = \min_i\left\{C^c_{ti} + S^c_i(u)\right\} + \min_j\left\{C^c_{tj} + S^c_j(w)\right\} \end{displaymath}$

(3)

The minimum total cost of a tree with root r is:

$\begin{displaymath} S(T) = \sum_{c=1}^m \min_t S^c_t(r) \end{displaymath}$

(4)

Step 2: Based on the numbers S^c_t(v) calculated in step 1, we will now determine the optimal values for each character c in the internal nodes. We will traverse the tree in preorder this time:
For the root node r, we will choose $r_c = \arg\min_t S^c_t(r)$ .
For any other node v, with parent node u, set:

$\begin{displaymath}v_c = \arg\min_t(C^c_{u_ct} + S^c_t(v)) \end{displaymath}$

Complexity: For every node we do O(k) work in each step, meaning $O(n \cdot k)$ per character. The algorithm should be applied once for each character, with a total complexity of $O(m \cdot n \cdot k)$ .

Weighted Characters
It is possible to assign weights not only to state changes, but also to the characters themselves. Technically, this means assigning a number W_c to each character, and rewriting equation 8.5 to read:

$\begin{displaymath}S(T) = \sum_{c=1}^{m}W_c\cdot\min_iS_i(r) \end{displaymath}$

(5)

Where do we get the weights W_c? For instance, if we are working with a DNA sequence, and we know the reading frame, we can make use of the fact that changes in the third codon position are more frequent, since in many cases they don't change the amino acid coded.

In section 8.2.2 we will see another possible source for weights - compatible characters. In short, we will give more weight to characters which seem to fit the tree well than to characters which fit it poorly.

Next: Large Parsimony Up: Parsimony Previous: Small Parsimony

Peer Itsik
2001-01-01