Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Smith-Waterman alignment

Description
Aligning sequences is of great import in bioinformatics. Many discoveries are based on finding sequences that align to each other. Evolution theory and phylogenetics are based on sequence alignments.
This project is about implementing a well-known algorithm for aligning two sequences, i.e. finding where they match in an optimal fashion.
In this project the goal is to find the best local alignment of the two sequences given as input, i.e. the optimal alignment that covers most/best of both sequences.

Input and output
The input is just a fasta file with two dna sequences, that should be aligned.
The output should be the the best local alignment with clear notation where it is in both sequence inputs. If the alignment does not cover a given area (percentage) of both sequences, then a message explaining that there is not sufficient coverage should be printed instead.

Examples of program execution:
align.pl <percentage> <fastafile>
align.pl 50 fastafile.fsa

Details
Here is a fasta file of dna7.fsa of similar dna sequences coding for insulin.

Wikipedia has a page about Smith-Waterman alignment.
Some slides on the algorithm. It starts at slide 9, but read all.
Google has part of a book on the subject.