next up previous
Next: PSI BLAST Up: Significance of Scores Previous: FastA scores

BLAST Score

BLAST scores rely on extensive theory. We start by making the following assumptions:
  
Figure 5.10: Random walk: The score for a match is +2 and the punishment for a missmatch is -1, As shown,the expectancy for the whole walk is negative. The probability that the Top Score will be larger than X decreases exponentially with x.
\includegraphics[width=11cm]{lec05_picturs/slide6I.eps}

When searching a query of length m in a database of total length n one performs m*n random walk experiment, each with exponentially decreasing probability of achieving a score S. Thus, the E-value for score s is: $ KmnE^{-\lambda S}$. $\lambda$ and K are constants:

Indeed the E-score is normalized by the length of the query and database: The same alignment would have different E-score if these length are different. Also the E-score is exponential, thus it is instructive to consider a normalization of the E-score into logarithmic scale, called the Bit - score.

The Bit-score B is computed from the E-score E by E=mn2-B. Obviously, the Bit-score is linear in the raw score s: $B=\frac{\lambda S - ln(K)}{ln(2)}$.
In contrast to raw scores, that have little meaning without k and $\lambda$, the Bit-score is measured in standard units (see eg. [17]). Naturally, the meaning of the Bit-score depends on sizes of the query and the database.

Again, as mentioned before one can ask for the P-value (the probability of the observed number of records with a known E-value or lower).
Define the random variable Y to be the observed number of pairs achieveing E-value E or better(smaller).

Y is distributed Poisson with (E). The Probability of Ye to be r is ${\frac{exp(-E)E'}{r!}}$, and the probability of Ye to be 0 is equivilant to the probability that the (Best E-score < E)=exp (-E). Specifically the chance of finding zero alignments with score >= S is e-E so the probability of finding at least one such alignment is 1-e-E . This is the P-value associated with the score S (see eg. [17]). Note that this model assumes an I.I.D trial for each database position.


next up previous
Next: PSI BLAST Up: Significance of Scores Previous: FastA scores
Peer Itsik
2000-12-11