next up previous
Next: GENSCAN Up: Generalized HMM Previous: GenScan Model

prediction

Definition   A parse $\Phi$ of sequence S of length L is an ordered sequence of states ( $q_1,\dots ,q_t$) with an associated duration di to each state ( $L=\sum_{i=1}^{t} d_i$). Parse is actually a possible annotation to a base squence, matching each subsequence with appropriete functional unit of a gene.

We are given a parse $\Phi$ and a sequence S. Let Si be the segment of S produced by qi, and let P(Si|di) be the probability of generating Si by the sequence generation model of state qi with length di. The probability the model went through the states to create S according to $\Phi$ is:

\begin{displaymath}P(\Phi,S)=\pi_{q_1}f_{q_1}(d_1)P(S_1\vert d_1)\prod_{k=2}^{t}T_{q_{k-1}q_k}f_{q_k}(d_k)P(S_k\vert d_k)
\end{displaymath}

Suppose we are given a DNA sequence S and a specific parse $\Phi$, both of length L. The conditional probability of the parse $\Phi$ given that the sequence generated is S, can be computed as:

\begin{displaymath}P(\Phi\vert S)=\frac{P(\Phi,S)}{P(S)}=\frac{P(\Phi,S)}{\sum \limits
_{\Phi _i \mbox { is a parse of length L}}^{}P(\Phi_i,S)} \end{displaymath}

The most probable parse, $\Phi_{opt}$, can be computed by Viterbi like algorithm. P(S) can be computed by a forward-like algorithm.

Peer Itsik
2000-12-25