Clone Pair Overlap Score

Next: The Construction Algorithm Up: Constructing Physical Maps from Previous: Problem Statement

Clone Pair Overlap Score

Let C_a and C_b be two clones viewed as intervals of the same length l. Define $C_{\gamma}$ = $C_{a} \bigcap C_{b}$ and $l_{\gamma} = \vert C_{\gamma} \vert$ . The relative position of C_a,C_b and $C_{\gamma}$ is shown in figure 9.10. The overlap score uses the hybridization vectors $\overrightarrow{B_{a}}, \overrightarrow{B_{b}}$ to produce a vector probabilities for each length $l_{\gamma}$ of the overlap.

**Figure 9.10:** Clone pair overlap score
$\includegraphics{lec09_fig/clone_overlapping.eps}$

We first calculate the probability $Pr(\overrightarrow{B_{a}}, \overrightarrow{B_{b}} \vert l_{\gamma} = t)$ . Let , $C_{v} = C_{b} \backslash C_{a}$ , and recall that A_i,j is the number of occurrences of probe j in C_i. We can thus write the following equation:

$\begin{eqnarray*}Pr(B_{a,j},B_{b,j} \vert l_{\gamma} = t) &= & \sum_{K_{u}}\su... ... & & \cdot Pr(A_{\gamma,j} = K_{\gamma}\vert l_{\gamma} = t) \end{eqnarray*}$

The calculation of the probabilities inside the summation is straightforward using the statistical model. Since hybridization is a virtual certainty if a probe occurs many times inside a clone, we can limit the summation to small values of K_i (say $0 \leq K_{i} \leq 5$ ), thereby making the score computation feasible while introducing only a negligible error. Considering each probe as an independent source of information, the conditional probability of the vector pair $(\overrightarrow{B_{a}}, \overrightarrow{B_{b}})$ is:

$\begin{displaymath}Pr(\overrightarrow{B_{a}},\overrightarrow{B_{b}} \vert l_{\ga... ...t) = \prod_{j=1}^{n} Pr(B_{aj},B_{bj} \vert l_{\gamma} = t) \end{displaymath}$

(8)

Assuming uniform parameters for the probes, the expression $Pr(B_{a,j},B_{b,j} \vert l_{\gamma} = t)$ inside the product is independent of j. Therefore, we can define P_x,y[t] by P_x,y[t] = Pr(B_a,j = x, B_b,j = y | t). In practice, instead of computing P_x,y[t] for each t in the interavl [0,l], we use score quantization of this interval, and perform the computation only for representative values of t. Denoting by S_x,y(a,b) the set of probes $1 \leq j \leq n$ , such that B_a,j = x and B_b,j = y, we can write:

$\begin{displaymath}Pr(\overrightarrow{B_{a}},\overrightarrow{B_{b}} \vert t) =... ..._{x=0}^{1}\prod_{y=0}^{1}P_{x,y}[t]^{\vert S_{x,y}(a,b)\vert} \end{displaymath}$

(9)

Having computed $Pr(\overrightarrow{B_{a}},\overrightarrow{B_{b}} \vert t)$ we can use Bayes Theorem:

$\begin{displaymath}Pr(l_{\gamma} = t_{0} \vert \overrightarrow{B_{a}},\overrig... ...errightarrow{B_{b}} \vert l_{\gamma} = t)Pr(l_{\gamma} =t)} \end{displaymath}$

(10)

Next: The Construction Algorithm Up: Constructing Physical Maps from Previous: Problem Statement

Peer Itsik
2001-01-09