Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Hidden Markov Models

Morten Nielsen (mniel@cbs.dtu.dk)

I todays exercise you shall implement two algorithms for alignment of a sequence to a hidden Markov Model, The Viterbi, and Posterior decoding algorithms.


Implementing the algorithms

First, you must make a new directory for todays exercise and copy some data files.

cd $ALGOHOME
mkdir HMM
cd HMM
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/data/HMM/casino.* .

You should now have the files casino.hmm, casino.seq, and casino.seqlong in the HMM directory. Check this by typing

ls -ltr

The casino.hmm file is a parameter file describing the hidden Model, and the casino.seq and casino.seqlong files are files with sequences to be used for decoding.

The casino.hmm parameter files has the following format

M= 6 
N= 2
A:
0.95 0.05
0.1 0.90
B:
0.166 0.166 0.166 0.166 0.166 0.167
0.1 0.1 0.1 0.1 0.1 0.5
pi:
0.5 0.5

where M gives the number of symbols in each state, N gives the number of states, A: gives the transition probability matrix, B: gives the symbol emission probabilities in each state, and pi: gives the initial transition probabilities. Make sure to understand this format, and can make sense of all the parameters.

Next, you must copy the program templates to you src directory. (Remember the "." in the end of the command)

cd $ALGOHOME/src
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/HMM/viterbi.c .
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/HMM/posterior.c .

The first program viterbi.c implements the (guess what!) Viterbi decoding algorithm, and the second program posterior.c implements the forward/backward posterior decoding algorithm.


Viterbi

Open the file viterbi.c in your favorite editor. Most the code is (as usual) pre-written. Go to the main procedure. Make sure you understand the structure of the program.

You shall fill in the missing code (XXXX) in the viterbi subroutine. Again make sure you understand the structure of the routine, and then fill out the missing code.

Next, compile the program

make viterbi

When the compilation is successful copy the code to the bin directory

cp viterbi ../bin/
rehash

The rehash command updates a table used by the operating system to keep track of all executable programs to include your new program viterbi

Now go to the HMM directory. First check what is the content of the files casino.seq, and casino.seqlong.

head casino.seq casino.seqlong

Next, run the viterbi program on the file casino.seq

viterbi casino.hmm casino.seq

Look at the output. How does the output compare to the casino.seq_vit?

Do the same for the casino.seq_long file. How does the output compare to the casino.seqlong_vit?


Posterior decoding

Now go back to the src directory and open the file posterior.c. Spend some time to make sure you understand the structure of the program. Fill in the missing code (XXXXXX). Note, that both the subroutine posterior and the main routine has un-written code. Next compile the program

                        
make posterior

When the compilation is successful, copy the code to the bin directory

cp posterior ../bin
rehash

Now go to the HMM directory. You can now test your posterior decoding algorithm on the two data set from before casino.seq, and casino.seq_long.

How does the output compare to casino.seq_post and casino.seqlong_post?

Make sure you understand how the output from the posterior decoding for the same observation is modified depending on flanking observations.


Now you done.