next up previous
Next: PAM matrices Up: PAM units and PAM Previous: PAM units and PAM

   
PAM units

We use PAM units to measure the amount of evolutionary distance between two amino acid sequences. Two strings S1 and S2 are said to be one PAM unit diverged if a series of accepted point mutations (and no insertions of deletions) has converted S1 to S2 with an average of one accepted point-mutation event per 100 amino acids. The term ``accepted'' here means a mutation that was incorporated into the protein and passed to its progeny. Therefore, either the mutation did not change the function of the protein or the change in the protein was beneficial to the organism. Note that two strings which are one PAM unit diverged do not necessarily differ in one percent, as often mistakenly thought, because a single position may undergo more than a single mutation. The difference between the two notions grows as the number of units does.

There are two main problems with the notion of the PAM units:

1.
First, practically all the sequences we can obtain today are extracted from extant organisms. We almost do not know any protein sequences where one is actually derived from the other. The lack of ancestral protein sequences is handled by assuming that amino acid mutations are reversible and equally likely in either direction. This assumption, together with the additivity property of the PAM units derived from its definition, imply that given two amino acid sequences: Si and Sj whose mutual ancestor is Sij we have:

d(Si,Sj) = d(Si,Sij) + d(Sij,Sj)

when d(i,j) is the PAM distance between amino acid sequences i and j.

2.
The second problem, which is more difficult to overcome, is that we disregard here insertions and deletions which may occur during evolution, hence we can not be sure of the correct correspondence between sequence positions. In order to know the exact correspondence one has to be able to identify the true historical gaps, or, at least to identify large intervals along the two sequences where the correspondence is correct. This can not always be done with certainty, especially when the two sequences are distantly diverged.


next up previous
Next: PAM matrices Up: PAM units and PAM Previous: PAM units and PAM
Itshack Pe`er
1999-01-10