next up previous
Next: Transition probabilities Up: GENSCAN Previous: GENSCAN

   
Initial state probabilities


  
Figure 7.10: Gene density and structure as a function of C+G composition.

The initial probabilities of various states in the model should be proportional to the frequencies with which various functional units occur in the actual human genomic data. For example, if the estimated proportion of the non-coding intergenic region is 80%, then initial probability for the state N (see figure 7.9) must be around 0.8. But as a matter of fact, the relative bulk of the various functional units is found to vary considerably with the C+G content (isochore) of the genomic sequence (see figure 7.10). Thus, for training GENSCAN the training set is divided into four categories depending on the C+G content of the sequence. The categories are:
0
1.
( < 43% C+G)
2.
(43 -51% C+G)
3.
(51 - 57% C+G)
4.
( > 57% C+G)
For each of these categories, separate initial state probabilities are computed by estimating the relative frequencies of various functional units in these categories.

Itshack Pe`er
1999-02-03