Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping

Probabilistic Models for Mapping

Recall the following defintion: A Poisson process of rate $\lambda$ is described by:

A non decreasing function $N: R_{0}^{+} \rightarrow N$ where N(t) = number of events until time t
N(0) = 0
The number of events in disjoint intervals are independent

$\begin{displaymath}P(N(t+s) - N(s) = n) = e^{-\lambda t} \frac {({\lambda t})^{n}} { n! } \mbox{, for } n = 0,1,\ldots \mbox { and } s \geq 0 \end{displaymath}$ (2)

As as consequence, distribution of the number of events in an interval is stationary, i.e. depends only on the length of the interval. The expected number of events in an interval of length t is given by $E(N(t)) = \lambda t$ .

Denote:

T_n = time between event n-1 and event n
S₀ = 0
$S_{\imath} = \sum_{i=1}^{i} T_{\imath}$

Recall that inter-event times in a Poisson process are i.i.d. random variables, exponentially distributed with parameter $\lambda$ , i.e.,

$\begin{displaymath}Pr(T_{i} > t) = e^{- \lambda t} \end{displaymath}$

(3)

If it is known that $n \geq 1$ events occurred in a Poisson process until time t, then the inter-arrival times $\{S_{1},S_{2},...,S_{n} \}$ are distributed uniformly and independently in [0,t]. Assume clone length L, genome length G, and choose N clones at random. What is the expected fraction of the genome covered by clones? For a random point b, and an arbitrary clone C the probability of the point b being included in the clone c is given by:

$\begin{displaymath}Pr(b \in c) = \frac {L} {G} \end{displaymath}$

(4)

and therefore, the probability of b being out of all the clones is given by:

$\begin{displaymath}Pr(\forall c : b \notin c) = (1 - \frac {L} {G})^{N} = (1 - \frac {L} {G})^{G \frac {N} {G}} \sim e^{-\frac {NL} {G}} \end{displaymath}$

(5)

with the last approximation being valid when $L \ll G$ and $N \ll G$ . The fraction

$\begin{displaymath}R=\frac{NL}{G}\end{displaymath}$

is said to be the redundancy of the clone set. The expected fraction of non-covered genome is given by

$\begin{displaymath}E(\mbox{fraction not covered}) = e^{-R} \end{displaymath}$

(6)

where R is the redundancy of the clone set Table 9.2 shows that using redundancy factor of 2 to 5 gives a good coverage of the genome segment considered.

Table 9.1: Coverage of genome segment depending on redundancy factor

R	Coverage
1	0.63
2	0.865
3	0.95
4	0.98
5	0.993

Assume clone length = 1, and denote N = number of clones, R = redundancy factor, and assume that the clone starting positions follow a Poisson process with rate $\lambda$ . We define a minimal overlap factor $\theta$ between clones to identify overlap, that is, two clones defined to overlap only if they share at least a $\theta$ -length section. A set of clones covering a continuous segment of the genome, together with their physical distances is called a contig. Contigs are sometimes referred to as islands.
$\begin{theorem}% latex2html id marker 236 Lander-Waterman 1988 \cite{lander... ...R(1-\theta)} - 1} {R} + \theta \end{equation} \end{enumerate} \end{theorem}$

$\begin{proof}% latex2html id marker 258 We will prove the first item of the theo... ... islands}) = N \cdot J(\theta) = Ne^{-R(1-\theta)} \end{equation} \end{proof}$

Next: Constructing Physical Maps from Up: No Title Previous: Solving the Unique Mapping

Peer Itsik
2001-01-09