Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Gibbs Sampling

Description
Gibbs sampling is a method that can be used to find weak signals or motifs in sequences, by the means of Monte Carlo sampling.
Monte Carlo sampling is used to perform an effective sampling of the alignment space, by implemented two different moves
- single sequence move
- phase shift move
In the single sequence move, a random place in the alignment is chosen and the sequence is randomly shifted
in the phase shift move, the "alignment window" (See Figure 5) is randomly shifted, to escaping possible local minima.
For accepting a move in the Monte Carlo sampling the energy of the alignment need to be low. The alignment energy is calculated based on a log-odds weight matrix
as the log of the pseudocount and the sequence weight of amino acid a at position p divide by the background frequency af amino acid a.

The project is about reconstruction part of the work done in this article from CBS, namely constructing a Gibbs sampler in perl for finding motifs of length 9 in protein sequences. The paragraphs "Optimization of parameters for deriving amino acid sequences" and "Gibbs sampling" are especially important.

1) Monte Carlo Sampling 2) Sequence weighting method: Henikoff and Henikoff scheme.
3) Null model: flat distribution.
4) Pseudo-count correction method: Blossum62
5) Weight on pesudo-cont-correction: Follows by choice of 1)

If the student feels like it, then different models for 3 and 4 can be implemented.

Input and output
Input for training/test can be found at http://www.cbs.dtu.dk/biotools/EasyGibbs/, which is also a great site for checking the performance of your implementation.

The output should be the aligned sequences. The motif should be apparent.

Examples of program execution:
gibbs-sampler protein.lst blossum62.mat

Details
Wikipedia has a page about Gibbs sampling.
PDF intro to Gibbs sampling.