Next: References Up: No Title Previous: Multiple Alignment with Profile

Gibbs Sampling

$\begin{problem} Locating a common pattern.\\ {\bf {INPUT:}} A set of sequence... ...at the similarity between the $n$\space sub-strings is maximized. \end{problem}$

Let $a^{(1)},\ldots,a^{(n)}$ be the starting indices of the chosen sub-strings in $S^{(1)},\ldots,S^{(n)}$ , respectively. We introduce the following notations:

Let c_ij be the number of occurrences of the symbol $j \in \Sigma$ among the $i^{\mbox{th}}$ positions of the n sub-strings: { $s^{(1)}_{a^{(1)}+i-1},\ldots,s^{(n)}_{a^{(n)}+i-1}$ }.
Let q_ij denote the probability of the symbol j to occur at the $i^{\mbox{th}}$ position of the pattern.
Let p_j denote the frequency of the symbol j in all sequences of $\mathcal{S}$ .

We therefore wish to maximize the logarithmic likelihood score:

$\begin{displaymath}Score = \sum_{i=1}^{w}{{\sum_{j \in \Sigma}{c_{ij} \cdot \log{\frac{q_{ij}}{p_{j}}}}}} \end{displaymath}$

(70)

To accomplish this task, we perform the following iterative procedure:

1.: Initialization: Randomly choose $a^{(1)},\ldots,a^{(n)}$ .
2.: Randomly choose $1 \leq z \leq n$ and calculate the c_ij, q_ij and p_j values for the strings in $\mathcal{S}$ $\setminus S^{(z)}$ .
3.: Find the best substring of S^(z) according to the model, and determine the new value of a^(z). This is done by applying the algorithm for local alignment for S^(z) against the profile of the current pattern.
4.: Repeat steps 2 and 3 until the improvement of the score is less then $\epsilon$ .

Unlike the profile HMM technique, the Gibbs sampling algorithm (due to Lawrence et al. [8]) does not rely on any substantial theoretic basis. However, this method is known to work in specific cases.

Known problems:

Phase shift - The algorithm may converge on an offset of the best pattern.
The value of w is usually unknown. Choosing different values for w may significantly change the results.
The strings may contain more than a single common pattern.
As is the case with the Baum-Welch algorithm, the process may converge to a local maximum.

Next: References Up: No Title Previous: Multiple Alignment with Profile

Peer Itsik
2000-12-19