next up previous
Next: Proteins Up: Genetic information Previous: The Genetic Code

The Gene Finding Problem


\begin{problem}Given a DNA sequence, predict the location of genes (open reading frames), exons
and introns.
\end{problem}
A simple solution might be to seek stop codons in regions along the sequence. Clearly, if several stop codons appear close to each other in a region, it cannot be a coding region, since it would have been terminated. When a relatively long sequence does not contain stop codons, it becomes more probable that it contains a coding region. The problem becomes more complex in eukaryotic DNA due to the existence of interleaved exons and introns. In that case, a stop codon does not indicate that the sequence is not in a gene, but merely that the sequence is not in an exon. Further complications arise from the fact that a certain DNA sequence can be interpreted in 6 different ways: 3 different offsets for each of the possible 'starting points' (the reading frame of the codons) times two for the reading direction. It is safe to assume that in most cases, apart from prokaryotic species, a DNA region will encode only one gene.



Peer Itsik
2000-11-13