next up previous
Next: Promoter variation Up: Detection of Promoter Regions Previous: Positional Weight Matrix

Scoring Function

Let fb denote the expected frequency of the base b in the genome (the background frequency). We calculate the likelihood of a given sequence being a TATA-box. For a sequence $S=B_1B_2\ldots
B_6$ the likelihood of it being a TATA-box is:

\begin{displaymath}P(S\vert S\mbox{ is
a TATA-box})=\prod_{i=1}^{6}f_{B_i,i} \end{displaymath}

Similarly, the likelihood of observing it, given it is a "non-promoter" is:

\begin{displaymath}P(S\vert S\mbox{ is not a TATA-box})\approx P(S) =
\prod_{i=1}^{6}f_{B_i} \end{displaymath}

The log-likelihood ratio is therefore:

\begin{displaymath}\log\left(\frac{P(S\vert\mbox{promoter})}{P(S\vert\mbox{non-p...
...ht)=
\sum_{i=1}^{6}\log\left(\frac{f_{B_i,i}}{f_{B_i}}\right) \end{displaymath}

This model has the disadvantage that it doesn't exploit all of the known information (i.e. dependencies between bases occurring in the promoter regions etc.) The fBi are given in Figure [*].
  
Figure: Positional weight matrix for TATA box [].




Peer Itsik
2000-12-25