Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Introduction to C programming. Part 3

Now we shall continue the introduction to C, working with

  • Making a C program to calculate a BLOSUM scoring matrix matching two fasta sequences and visualize this matrix using R.

Now you must try it your self. You shall complete the program fasta2scoremat.c, so that it reads two fasta files, and a BLOSUM substitution scoring matrix, and calculates the amino acids scoring matrix between the two sequences. The fasta2scoremat.c file already has most of the coded. You just need to complete the program and compile it. When the program is completed it should function as follows

fasta2scoremat file1.fsa file2.fsa

When you have compiled the program sucessfully, place a copy of the executable in the bin directory by typing

cp fasta2scoremat ../bin/

and type


to update the system table of executables

You can now access the program from any directory by simply typing


The output from the program is a Blosum scoring matrix matching the two input sequences. This matrix can be visualized to identify if the sequence can be aligned.

In the data directory I have placed two fasta file 1PLC._.fsa, 1PLB._.fsa. Go to your home directory, make a new directory called for instance scoremat, go to this directory and run the program to construct the scoring matrix between the two sequences, and save the output in a file called score.mat

fasta2scoremat ../data/1PLC._.fsa ../data/1PLB._.fsa | grep -v "#" > score.mat

You can now visualize this scoring matrix using the heatmap procedure in R. Start R (version 2.9) by typing


and use the following commands <- as.matrix(read.table("score.mat", sep="\t",, header=T, row.names=1))
heatmap(,scale="none", Rowv=NA, Colv=NA)

It might take a little while for the program to make the visualization. Can you see if these two proteins can be aligned using the Blosum substitution scoring matrix?

This is all for now!