next up previous
Next: Improved BLAST Up: No Title Previous: FASTA

   
BLAST - Basic Local Alignment Search Tool

The BLAST algorithm was developed by Altschul, Gish, Miller, Myers and Lipman in 1990 [1] The motivation to the development of BLAST was the need to increase the speed of FASTA by finding fewer and better hot spots during the algorithm. The idea was to integrate the substitution matrix in the first stage of finding the hot spots.

BLAST concentrates on finding regions of high local similarity in alignments without gaps, evaluated by an alphabet-weight scoring matrix. We next define some fundamental objects concerning BLAST:

Given two strings S1 and S2, a segment pair is a pair of equal length respective substrings of S1 and S2, aligned without spaces.

A locally maximal segment pair is a segment pair whose alignment score (without spaces) can not be improved by extending it or shortening it.

A maximal segment pair (MSP) in S1 and S2 is a segment pair with the maximum score over all segment pairs in S1, S2.

When comparing all the sequences in the database against the query, BLAST attempts to find all the database sequences that when paired with the query contain an MSP above some cutoff score S. We choose S such that it is unlikely to find a random sequence in the database that achieves a score higher than S when compared with the query sequence.

The stages in the BLAST algorithm are as follows:

1.
Given a length parameter w and a threshold parameter t, BLAST finds all the w-length substrings (called words) of database sequences that align with words from the query with an alignment score higher than t. Each such hot spot is called a hit in BLAST.

2.
We extend each hit to find out if it is contained in a segment pair with score above S.

We may implement the first stage by constructing, for each w-length word $\alpha$ in the query sequence, all the w-length words whose similarity to $\alpha$ is at least t. We store these words in a data structure which is later accessed while checking the database sequences.

It is usually recommended to set the parameter w to values of 3 to 5 for amino acids, and $\sim{12}$ for nucleotides.

Although BLAST does not allow alignments with indels, it has been shown that with the correct selection of values to the parameters used by the algorithm, it is possible to obtain all the correct alignments while saving much of the computation time compared to the standard dynamic programming method.



 
next up previous
Next: Improved BLAST Up: No Title Previous: FASTA
Itshack Pe`er
1999-01-10