Pairwise Alignment and Database Searching
My presentation is available in Powerpoint
Pairwise alignment exercise
Below, you see two protein sequences. They are both globins from a midge, Chironomus thummi
thummi, a very small and annoying insect.
>GLB7_CHITH 145 P02226 GLOBIN CTT-VIIA.
>GLBP_CHITH 152 P11582 GLOBIN CTT-E/E' PRECURSOR.
These sequences are given in the FASTA format, an extensively used format for input to
bioinformatics programs: a line beginning with a ">" contains the name of a sequence plus
optional comments, while the other lines until the next ">" contains the sequence itself.
Do a global alignment of these two protein sequences, using the ALIGN service at the GENESTREAM network
server, IGH, Montpellier, France.
Hint: You can copy the sequences and sequence names from this page and paste them into the
input windows at the French site. Note that only the sequences (not the header lines) should be
Take a look at the result. Note that there is a gap in GLBP_CHITH - what is the
corresponding sequence of GLB7_CHITH? (This is an authentic example - you are welcome to
retrieve the original database entries for GLBP_CHITH and GLB7_CHITH
from the SWISS-PROT database.)
Now try a local alignment of the same two sequences, using the LALIGN service instead. Compare the output
with that of ALIGN. You will get the ten best-scoring local alignments, sorted by decreasing
similarity score. Note that by using LALIGN the alignment is truncated compared to the
Question: Does global or local alignment yield the highest alignment score? Why?
Question: The alignment program used BLOSUM50 to align the sequences. Does that make sense
given the alignments obtained?
Database search exercise
Below are two protein sequences in FASTA format.
Perform a BLAST search against the SWISS-PROT database.
at the GENESTREAM server for this. NOTE: Make sure to select
the correct database (swissprot) and alignment method (blastp)
in the drop-down menus!
Question 1: Which functions would you assign to these two proteins based on your
Question 2: Try using different substitution matrices when performing the
BLAST searches. How does this affect the expectation scores? (For instance, note the
E-values for the database hit "ADH3_ECOLI" using BLOSUM45, BLOSUM62, and BLOSUM80 and
- E-values are not absolute measures of how good a database hit is. In theory,
E-values depend on the sequence, the database, and the substitution matrix/scoring system.
(In practice, BLAST uses pre-computed score-distributions so BLAST E-values only depend
on substitution matrix - this means they are sometimes overestimated!).
- It is generally safe to assign function X to an unknown protein if it has many
strong hits to proteins with function X in the database. HOWEVER, be cautious when
a sequence only has hits to proteins with putative functions!
Redo the analysis of LAST_ECOLI this time using FASTA3_T with the BLOSUM62 matrix to
search SWISS-PROT database. Use the FASTA3 service at the GENESTREAM server for
this. NOTE: again, make sure to select the correct database (swissprot) and substitution matrix
Question: How does the E-values compare to those obtained using BLAST
for a given substitution matrix? (For instance, note the E-value for YFHQ_ECOLI
for BLAST vs. FASTA, using BLOSUM62).
Take-home message: FASTA gives a better estimate of the real E-value (compared to BLAST) of
a database hit since it takes into account the actual score-distribution of the current
databasesearch. For some reason E-values computed by FASTA are usually worse (i.e., larger) than
E-values computed by BLAST.
Links to web-based tools
- DOTLET - an applet for making dotplots
- LALIGN - a tool for performing local (Smith-Waterman) alignment
- SIM - alternative local alignment tool
- FASTA - fast database search tool
- BLAST - faster database search tool
- CD-BLAST - Fast search of sequence against profile database
Links to online tutorials
You are not required to read any of the material below. But if you are looking for more information on sequence alignment these are definately good places to start: