next up previous
Next: Displaying a Multiple Alignment Up: Multiple Sequence Alignment Previous: Choosing Sequences for Alignment

   
Pileup - Multiple Alignment in GCG

GCG (Genetic Computer Group) is a package of sequence analysis programs which can be run trough [22]. Pileup creates multiple sequence alignment from a group of related sequences using progressive, pairwise alignment method of Feng and Doolittle [2]. It can also plot a tree showing the clustering relationships used to create the alignment.
The input file for Pileup is a list of sequence file names or sequence accession numbers in the database.
Pileup follows the general scheme outlined in section 4.3.2. The clustering strategy called UPGMA that stands for Unweighted Pair-Group Method using Arithmetic average [13].
The clustering algorithm it uses initializes the clusters one sequence each, and iteratively constructs larger clusters. In each iteration, it merges the two clusters whose pairwise alignment distance is the smallest. Cluster pairwise alignment is a simple extension of sequence alignment: For a pairwise alignment of clusters of sequences, the comparison score between any two positions in those clusters is simply the arithmetic average of the scores for all possible symbol comparisons at those positions. When gaps are inserted into a cluster to produce an alignment, they are inserted at the same position in all of the sequences of the cluster [8]. The full multiple alignment is obtained once all the sequences have been clustered into one cluster. This hierarchical clustering is naturally described by a dendrogram, which Pileup can plot (see figure 4.2).
As a general rule, Pileup can align up to 500 sequences, with any single sequence in the final alignment restricted to a maximum length of 7000 characters (including gap characters inserted into the sequence by Pileup to create the alignment). However, the longer are the sequences in the alignment, the number of sequences Pileup can handle decreases.
  
Figure 4.2: Dendrogram from Pileup. Distances along the vertical axis is proportional to the difference between sequences. Distance along the horizontal axis has no significance.

\fbox{\epsfig{figure=./lec04_fig/figure02.ps}}






next up previous
Next: Displaying a Multiple Alignment Up: Multiple Sequence Alignment Previous: Choosing Sequences for Alignment
Itshack Pe`er
1999-01-17