Introduction to C programming. Part 3
Now we shall continue the introduction to C, working with
- Making a C program to calculate a BLOSUM scoring matrix matching two fasta sequences and visualize this matrix using R.
Now you must try it your self. You shall complete the program fasta2scoremat.c, so that it reads
two fasta files, and a BLOSUM substitution scoring matrix, and calculates the amino acids
scoring matrix between the two sequences. The fasta2scoremat.c file already has most of the
coded. You just need to complete the program and compile it. When the program is completed it
should function as follows
fasta2scoremat file1.fsa file2.fsa
When you have compiled the program sucessfully, place a copy of the executable in the bin directory by
cp fasta2scoremat ../bin/
to update the system table of executables
You can now access the program from any directory by simply typing
The output from the program is a Blosum scoring matrix matching the two input sequences. This
matrix can be visualized to identify if the sequence can be aligned.
In the data directory I have placed two fasta file 1PLC._.fsa, 1PLB._.fsa.
Go to your home directory, make a new directory called for instance scoremat, go to this
directory and run the program to
construct the scoring matrix between the two sequences, and save the output in a file called score.mat
fasta2scoremat ../data/1PLC._.fsa ../data/1PLB._.fsa | grep -v "#" > score.mat
You can now visualize this scoring matrix using the heatmap procedure in R. Start R (version 2.9) by typing
and use the following commands
peptid.data <- as.matrix(read.table("score.mat", sep="\t", as.is=T, header=T, row.names=1))
heatmap(peptid.data,scale="none", Rowv=NA, Colv=NA)
It might take a little while for the program to make the visualization.
Can you see if these two proteins can be aligned using the Blosum substitution scoring matrix?
This is all for now!