next up previous
Next: Large Parsimony Up: Parsimony Previous: Small Parsimony

Weighted Parsimony

In this version of the problem the price of a change is not constant. Instead, denote by Ccij the cost of the character c changing from state i to state j. The problem is still to minimize the total cost of the tree given the topology and the leaf labels.


\begin{problem}Weighted Small Parsimony.\\ *
{\bf INPUT:}
\begin{itemize}
\item ...
...What is the optimal labeling of the internal nodes?
\end{enumerate}\end{problem}
We will present an algorithm by Sankoff [15] which is a generalization2 of the Fitch algorithm.

Sankoff's algorithm:

Step 1: We will compute, for each node v and each state t a quantity St(v) which is the minimum cost of the subtree whose root is v given vc = t. The order of computation will be, as in step 1 of Fitch, postorder: For each leaf v:

\begin{displaymath}S^c_t(v) = \left\{
\begin{array}{ll}
0 & v_c = t \\
\infty & \rm {otherwise}
\end{array}\right.
\end{displaymath} (2)

For an internal node v, with subnodes u and w, it is easy to see that:

\begin{displaymath}S^c_t(v) = \min_i\left\{C^c_{ti} + S^c_i(u)\right\} + \min_j\left\{C^c_{tj} + S^c_j(w)\right\}
\end{displaymath} (3)

The minimum total cost of a tree with root r is:

 \begin{displaymath}
S(T) = \sum_{c=1}^m \min_t S^c_t(r)
\end{displaymath} (4)

Step 2: Based on the numbers Sct(v) calculated in step 1, we will now determine the optimal values for each character c in the internal nodes. We will traverse the tree in preorder this time:
For the root node r, we will choose $r_c = \arg\min_t S^c_t(r)$.
For any other node v, with parent node u, set:

\begin{displaymath}v_c = \arg\min_t(C^c_{u_ct} + S^c_t(v)) \end{displaymath}

Complexity: For every node we do O(k) work in each step, meaning $O(n \cdot k)$ per character. The algorithm should be applied once for each character, with a total complexity of $O(m \cdot n \cdot k)$.

Weighted Characters
It is possible to assign weights not only to state changes, but also to the characters themselves. Technically, this means assigning a number Wc to each character, and rewriting equation 8.5 to read:

\begin{displaymath}S(T) = \sum_{c=1}^{m}W_c\cdot\min_iS_i(r)
\end{displaymath} (5)

Where do we get the weights Wc? For instance, if we are working with a DNA sequence, and we know the reading frame, we can make use of the fact that changes in the third codon position are more frequent, since in many cases they don't change the amino acid coded.

In section 8.2.2 we will see another possible source for weights - compatible characters. In short, we will give more weight to characters which seem to fit the tree well than to characters which fit it poorly.


next up previous
Next: Large Parsimony Up: Parsimony Previous: Small Parsimony
Peer Itsik
2001-01-01