next up previous
Next: Finding the Tree Up: Compatibility Previous: Compatibility and Parsimony

     
Pairwise Compatibility

The first step in working with compatibility, is parallel to the small parsimony problem (see 9.2.1): Given a tree T with labeled leaves, find the best compatibility score that can be achieved for that tree, i.e., the maximum number, over all possible labelings of internal nodes, of characters compatible with the fully-labeled tree. This can be done easily using Fitch's algorithm (see 9.2.1). The more interesting problem here is of course that of ``large compatibility'' - finding the best phylogeny given only the data matrix M. We shall tackle this problem through the notions of pairwise compatibility and mutual compatibility.  
\begin{dfn}{\rm T} \end{dfn}
wo characters c1 and c2 are said to be pairwise compatible (written PC(c1,c2)), if there exists a tree T such that both c1 and c2 are compatible with T.  
\begin{dfn}{\rm C} \end{dfn}
haracters $c_1, \ldots, c_k$ are said to be mutually compatible if there exists a tree T such that $\forall i: c_i$ is compatible with T.

We will present two theorems. The first, by Wilson [13], identifies pairwise compatible characters:

Theorem 9.8   Pairwise Compatibility Test:
  For character i,j define the set Sij to be: $\left\{(x,y) :
\exists species\ k\ such\ that\ M_{ki} = x\ and\ M_{kj} = y\right\}$, where M is the input matrix described in problem 9.1; then PC(c,c') iff $S_{cc'} \ne \{0,1\}^2$.

Proof:Assume $S_{cc'} \ne \{0,1\}^2$. Then the the set Scc' has at most 3 members. First of all, if Scc' has only a single member, then c and c' each have a single possible state, which is impossible - since they are both binary characters. If Scc' has only 2 members, then we can in fact treat the two characters as a single binary character. Let's assume then that $\{0,1\}^2 \setminus S_{cc'} = \left\{(x,y)\right\}$. Figure 9.7 illustrates the basic structure of a tree that is compatible with two characters, having 3 combined values - (<tex2htmlverbmark>1<tex2htmlverbmark> x, <tex2htmlverbmark>2<tex2htmlverbmark> y), (<tex2htmlverbmark>3<tex2htmlverbmark> x, y), and (x,<tex2htmlverbmark>4<tex2htmlverbmark> y). Each triangle represents a subtree in which the values of both characters remain constant. The only mutations are along the two edges marked with bars, proving this part of the theorem. The other direction is simple, and is left as an exercise to the reader.


  
Figure 9.7: A schematic description of a tree that is compatible with two characters, having 3 combined values (see proof of theorem 9.8).

\fbox{\epsfig{figure=lec09_figs/paircomptree.ps}}





The next, somewhat surprising, theorem by Estabrook [6] identifies mutually compatible sets of characters:

Theorem 9.9   Pairwise Compatibility Theorem:
All characters in a set S are mutually compatible iff $\forall c,c' \in S, PC(c,c')$.

We will not present a proof for this theorem.

So the problem of ``large compatibility'' is reduced to the problem of finding the largest mutually compatible set of characters, which amounts to finding the largest maximal clique in the pairwise-compatibility graph, defined as:

\begin{displaymath}G=(V,E);\ V=\left\{v_1,\ldots,v_m\right\};\ E=\left\{(v_i,v_j): PC(c_i,c_j)\right\}\end{displaymath}


This seems to be of no great help, because as we know, finding the largest maximal clique in a graph is an NP-hard problem. However, there are algorithms, such as Bron and Kerbosch's [1] Branch-and-Bound clique-finding algorithm, which seem to work very well with biological data. All in all, compatibility methods usually run faster than parsimony methods for the same data.
next up previous
Next: Finding the Tree Up: Compatibility Previous: Compatibility and Parsimony
Itshack Pe`er
1999-02-18