next up previous
Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping

Probabilistic Models for Mapping

Recall the following defintion:  A Poisson process of rate $\lambda$ is described by: Denote: Recall that inter-event times in a Poisson process are i.i.d. random variables, exponentially distributed with parameter $\lambda$, i.e.,

\begin{displaymath}Pr(T_{i} > t) = e^{- \lambda t}
\end{displaymath} (3)

If it is known that $n \geq 1$ events occurred in a Poisson process until time t, then the inter-arrival times $\{S_{1},S_{2},...,S_{n} \}$ are distributed uniformly and independently in [0,t]. Assume clone length L, genome length G, and choose N clones at random. What is the expected fraction of the genome covered by clones? For a random point b, and an arbitrary clone C the probability of the point b being included in the clone c is given by:

\begin{displaymath}Pr(b \in c) = \frac {L} {G}
\end{displaymath} (4)

and therefore, the probability of b being out of all the clones is given by:

\begin{displaymath}Pr(\forall c : b \notin c) = (1 - \frac {L} {G})^{N} = (1 - \frac
{L} {G})^{G \frac {N} {G}} \sim e^{-\frac {NL} {G}}
\end{displaymath} (5)

with the last approximation being valid when $L \ll G$ and $N \ll
G$.   The fraction

\begin{displaymath}R=\frac{NL}{G}\end{displaymath}

is said to be the redundancy of the clone set.   The expected fraction of non-covered genome is given by

\begin{displaymath}E(\mbox{fraction not covered}) = e^{-R}
\end{displaymath} (6)

where R is the redundancy of the clone set Table 9.2 shows that using redundancy factor of 2 to 5 gives a good coverage of the genome segment considered.






 
Table 9.1: Coverage of genome segment depending on redundancy factor
 
R Coverage
1 0.63
2 0.865
3 0.95
4 0.98
5 0.993



Assume clone length = 1, and denote N = number of clones, R = redundancy factor, and assume that the clone starting positions follow a Poisson process with rate $\lambda$. We define a minimal overlap factor $\theta$ between clones to identify overlap, that is, two clones defined to overlap only if they share at least a $\theta$-length section. A set of clones covering a continuous segment of the genome, together with their physical distances is called a contig. Contigs are sometimes referred to as islands.
  \begin{theorem}% latex2html id marker 236
Lander-Waterman 1988 \cite{lander...
...R(1-\theta)} - 1} {R} + \theta
\end{equation}
\end{enumerate}
\end{theorem}

\begin{proof}% latex2html id marker 258
We will prove the first item of the theo...
... islands}) = N \cdot J(\theta) =
Ne^{-R(1-\theta)}
\end{equation}
\end{proof}

next up previous
Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping
Peer Itsik
2001-01-09