next up previous
Next: Prokaryotes Up: Gene Finding Previous: Motivation

   
Biological Background

Gene expression is the biological process by which a DNA sequence generates a protein. It involves two steps: transcription and translation. Transcription produces a mRNA (messenger RNA) sequence using the DNA sequence as a template. The mRNA sequence produced is complementary to the DNA strand which was used as template. The subsequent process, called translation, synthesizes the protein from the mRNA. This process is performed by sub cellular elements called ribosomes (Figure 7.2).
The transcription is carried out from the 5' end to the 3' end of the DNA strand. This direction along the strand is called downstream while the opposite direction is called upstream. The enzyme preforming the transcription, RNA polymerase, starts transcription a few bases upstream of the start codon and terminates a few bases after the stop codon. The regions in both ends of the DNA coding region which are transcripted into a mRNA, but do not code the protein are called untranslated regions (UTR) (see figure 7.4 and 7.5). RNA polymerase molecules start transcription by recognizing and binding to promoter regions upstream of the desired transcription start sites. These promoter regions control the rate of gene expression.
  
Figure 7.1: Steps in gene expression


  
Figure 7.2: mRNA translation: The polypeptide chains are elongated as the ribosomes move along the mRNA molecules, with the 5' end of the mRNA being translated first.


  
Figure 7.3: The genetic code. AUG us the start codon, while UAA, UAG and UGA are the stop codons.

Since there are 64 different possible codons, and only 20 amino acids, multiple codons represent the same amino acid. Besides those codons coding amino acids, there is one, called start codon, that indicates the beginning of translation (as well as code for the amino acid Metionine), and three, called stop codons, that indicate end of translation. The genetic code is shown in figure 7.3. Because the codons are triplets of bases, any given DNA sequence can be interpreted in three possible ways, depending on where the coding starts. These three ways are called reading frames. An open reading frame (ORF) is a sequence of codons with no stop codon.
  
Figure 7.4: Typical prokaryotic gene structure at DNA level (not to scale)


  
Figure 7.5: Typical eukaryotic gene structure at DNA level (not to scale)



 
next up previous
Next: Prokaryotes Up: Gene Finding Previous: Motivation
Itshack Pe`er
1999-02-03