next up previous
Next: Aligning Sequences to a Up: Profile Alignment Previous: Profile Alignment

Profile HMMs

HMMs can be used for aligning a string versus a given profile, thus helping us to solve the multiple alignment problem.

We define a profile $\mathcal{P}$ of length L, as a set of probabilities, consisting of, for each $b \in \Sigma$ and $1 \leq i
\leq L$, the probability ei(b) of observing the symbol b at the $i^{\mbox{th}}$ position. In such a case the probability of a string $X=(x_{1},\ldots,x_{L})$ given the profile $\mathcal{P}$ will be:

\begin{displaymath}P(X \vert {\mathcal{P}}) = \prod_{i=1}^{L}{e_{i}(x_{i})}
\end{displaymath} (42)

We can calculate a likelihood score for the ungapped alignment of X against the profile $\mathcal{P}$:

\begin{displaymath}Score(X \vert {\mathcal{P}}) = \sum_{i=1}^{L}{ \log\frac{
e_{i}(x_{i})}{p(x_{i})}}
\end{displaymath} (43)

where p(b) is the background frequency of occurrences of the symbol b.

This leads to a definition of the following HMM: all the states are match states $M_{1},\ldots,M_{L}$ which correspond to matches of the string's symbols with the profile positions. All these states are sequentially linked (i.e., each match state Mj is linked to its successor Mj+1) as shown in figure 6.2. The emission probability of the symbol b from the state Mj is of course ej(b).


  
Figure 6.2: Match states in a profile HMM
\includegraphics[width=16cm]{lec06_fig/lec06_MatchStates.eps}

To allow insertions, we will add also insertion states $I_{0},\ldots,I_{L}$ to the model. We shall assume that:

\begin{displaymath}\forall_{b \in \Sigma} \quad e_{I_{j}}(b) = p(b)
\end{displaymath}

Each insertion state Ij has an link entering from the corresponding match state Mj, a leaving link towards the next match state Mj+1 and also has a self-loop (see figure 6.3). Assigning the appropriate probabilities for those transitions corresponds to the application of affine gap penalties, since the overall contribution of a gap of length h to the logarithmic likelihood score is:

\begin{displaymath}\underset{\mbox{gap creation}} {\underbrace{\log(a_{M_{j}I_{j...
...{gap extension}}
{\underbrace{(h-1)\cdot\log(a_{I_{j}I_{j}})}}\end{displaymath}


  
Figure 6.3: A profile HMM with an insertion state (and some match states)
\includegraphics{lec06_fig/lec06_InsertionState.eps}

To allow deletions as well, we add the deletion states $D_{1},\ldots,D_{L}$. These states cannot emit any symbol and are therefore called silent (Note that the begin/end states are silent as well). The deletion states are sequentially linked, in a similar manner to the match states and they are also interleaved with the match states (see figure 6.4).

  
Figure 6.4: Profile HMM with deletion and match states
\includegraphics{lec06_fig/lec06_DeletionStates.eps}

To model both insertions and deletions, we have to add a link from Dj to Ij and a link from Ij to Dj+1.

The full HMM for modeling the profile $\mathcal{P}$ of length L is comprised of L layers, each layer has three states Mj, Ijand Dj. To complete the model, we add begin and end states, connected to the layers as shown in figure 6.5. This model is due to Haussler et al [5].


  
Figure 6.5: Profile HMM for global alignment
\includegraphics{lec06_fig/lec06_HMM_GlobalAlign.eps}


next up previous
Next: Aligning Sequences to a Up: Profile Alignment Previous: Profile Alignment
Peer Itsik
2000-12-19