Hidden Markov Models
Morten Nielsen (firstname.lastname@example.org)
I todays exercise you shall implement two algorithms for alignment of a sequence to a hidden Markov Model,
The Viterbi, and Posterior decoding algorithms.
Implementing the algorithms
First, you must make a new directory for todays exercise and copy some data files.
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/data/HMM/casino.* .
You should now have the files casino.hmm, casino.seq, and casino.seqlong in the HMM directory.
Check this by typing
The casino.hmm file is a parameter file describing the hidden Model, and the casino.seq and casino.seqlong files are
files with sequences to be used for decoding.
The casino.hmm parameter files has the following format
0.166 0.166 0.166 0.166 0.166 0.167
0.1 0.1 0.1 0.1 0.1 0.5
where M gives the number of symbols in each state, N gives the number of states, A: gives the transition probability
matrix, B: gives the symbol emission probabilities in each state, and pi: gives the initial transition probabilities.
Make sure to understand this format, and can make sense of all the parameters.
Next, you must copy the program templates to you src directory. (Remember the "." in the end of the command)
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/HMM/viterbi.c .
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/HMM/posterior.c .
The first program viterbi.c implements the (guess what!) Viterbi decoding algorithm, and
the second program posterior.c implements the forward/backward posterior decoding
Open the file viterbi.c in your favorite editor. Most the code is (as usual) pre-written.
Go to the main procedure. Make sure you understand the
structure of the program.
You shall fill in the missing code (XXXX) in the viterbi subroutine.
Again make sure you understand the structure of the routine,
and then fill out the missing code.
Next, compile the program
When the compilation is successful copy the code to the bin directory
cp viterbi ../bin/
The rehash command updates a table used by the operating system to keep track of all executable programs to
include your new program viterbi
Now go to the HMM directory. First check what is the content of the files casino.seq, and casino.seqlong.
head casino.seq casino.seqlong
Next, run the viterbi program on the file casino.seq
viterbi casino.hmm casino.seq
Look at the output. How does the output compare to the casino.seq_vit?
Do the same for the casino.seq_long file. How does the output compare to the casino.seqlong_vit?
Now go back to the src directory and open the file posterior.c. Spend some time to make sure you understand the
structure of the program. Fill in the missing code (XXXXXX). Note, that both the subroutine posterior
and the main routine has un-written code. Next compile the program
When the compilation is successful, copy the code to the bin directory
cp posterior ../bin
Now go to the HMM directory. You can now test your posterior decoding algorithm on the two data
set from before casino.seq, and casino.seq_long.
How does the output compare to casino.seq_post and
Make sure you understand how the output from the posterior decoding for the same observation is modified depending on flanking observations.
Now you done.