next up previous
Next: Tools for Searching Up: How to Perform Database-Searching? Previous: Text Based Searching

Sequence Based Searching

DNA is made of 4 nucleotides : A,C,G,T, while proteins are build out of 20 amino acids which means that aligning two unrelated DNA sequences will result in  25$\%$ of random matching. Using proteins will result in weaker random similarity and thus fewer false positives. A major issue of concerns is DNA vs. protein searches: a coding nucleotide sequence, can be translated into a protein sequence. (The other direction is, of-course, ambiguous, because the genetic code is degenerated.) so suppose we have a nucleotide sequence. Should we search the DNA databases only? Or should we translate it into a protein and search protein databases? In the other hand, translating causes loss of information but on a second thought protein sequences are more evolutionary conserved than DNA sequences.

What about very different DNA seqeunces that code for similar protein sequence? we would like to find those too. It's better to use the protein for searching in this case too. Usually, we should use proteins for database similarity searches when possible.The reasons for this conclusion are:

As stated, a primary goal of sequence search is to find a sequence which is seams homologous to the query sequence, such a homologous sequence shares sequence similarity with the query sequence. The similarity is derived from common ancestry and conservation throughout evolution. Homologous proteins are similar in their structure. This is the basis for homology modeling structure determination through the structure of similar proteins.

The main goal in searching is finding relevant information and avoiding non relevant information, therefore define:


next up previous
Next: Tools for Searching Up: How to Perform Database-Searching? Previous: Text Based Searching
Peer Itsik
2000-12-11