Øvelse skrevet af: Morten Nielsen
Earlier in the course you have used the BLAST program to perform fast alignments of DNA and protein sequences. As
shown in todays lecture BLAST will often fail to recognize relationships between proteins with low sequence
similarity. In todays exercise, you shall use the iterative BLAST program (PSI-BLAST) to calculation sequence
profiles and see how such profiles can used to
- Identify relationships between proteins with low sequence similarity
- Identify conserved residues in protein sequences (residues important for the structural
stability or function of the protein)
First part. When BLAST fails
Say you have a sequence Query
and you want to make predictions about its function and structure. As seen earlier in the course, you will
most often use BLAST to do this. However what happens when BLAST fails?
Go to the
Paste in the query sequence
Set the database to pdb, and press
Now go back to the BLAST web-site.
Paste in the query sequence
Set the database to nr, select PSI-BLAST (Position-Specific Iterated BLAST) and press
- Q1 How many significant hits does BLAST find (E-value < 0.005)?
- Q2 How many significant hits does BLAST find (E-value < 0.005)?
- Q3 How large a fraction of the query sequence does the significant
hits match (excluding the identical matches)?
- Q4 Do you find any PDB hits among the significant hits (search for pdb in the
hit list or look for the colored S to the right of the E-value))?
Now run a second BLAST iteration. Press Run PSI-Blast iteration 2.
- Q5 How many significant hits does BLAST find (E-value < 0.005)?
- Q6 How large a fraction of the query sequence does the significant
hits match (do not include the first hit since this is identical to the query)?
- Q7 Why does BLAST come
up with more significant hits in the second iteration? Make sure you answer this question and
understand what is going on!
- Q8 Do you find any PDB hits among the significant hits (search for pdb in the
hit list or look for the red colored S to the right of the E-value)?
If you did not find a PDB hit among the significant hits, run a third Blast
- Q9 What is the PDB identifier for the best PDB hit?
- Q10 What is the sequence simularity between the query and this PDB hit?
- Q11 What is the function of this protein?
Identifying conserved residues
You have now (hopefully) identified a structural relationship between the Query sequence and
a protein sequence in the PDB database of protein structures. Say you would like to validate this
relationship. This one could do by mutating (substituting) essential residues in the query sequence
and test if the protein function (or structure) is affected by these mutations.
The protein sequence of the query is large (more than 400 amino acids) and a complete mutation study
including all residues would be extremely costly. Instead one can use PSI-BLAST and sequence profiles
to identify conserved residues that are likely to be essential for the protein structure and/or
Below you find a set of 8 residues from the Query protein sequence. You shall use the PSI-BLAST and Blast2logo
programs to select four of the eight residues for a mutagenesis study (you shall select the four
residues based on sequence conservation only).
- (a): H271
- (b): R287
- (c): E290
- (d): Y334
- (e): F371
- (f): R379
- (g): R400
- (h): Y436
You shall use the Blast2logo server to identify
which residues are conserved in the Query protein sequence. Go to the Blast2logo server and upload
the Query sequence. Select the
Blast database to NR70, and press submit (note it might take some (5-10) minutes before your job is completed).
If the job does not complete you can find the output following this link
When the job is completed you should see the logo-plot on the website. If the logo does not display, you can download
the image file (click on the Download logo file) and open it from your desktop.
- Q12.1 Spend a little time looking at the logo plot. Can you understand why the logo is so flat for
the first 100 residues (how large a fraction of the query section did the Blast search cover)?
- Q12.2 Which of the eight residues listed above are most conserved and hence most likely
to be essential for the protein stability and/or function?
You shall use the Phyre program to validate if the
structural properties of the four most conserved residues from question Q12 indeed could form an active site.
Go to the Phyre web-site and upload
the Query sequence.
Note it might take some (10-20) minutes before your job is completed. To save you time, I have
run the calculation for you. Yoy can find the output here Phyre output.
Find the PDB hit identified by PSI-BLAST (you can click on the SCOP code to get to the PDB template for each model).
- Q13 Does Phyre agree that this hit is significant?
Download the highest scoring (lowest E-value) Phyre model, and open the model file in Pymol.
If you do not have Pymol installed on your computer,
you can find a free download here Pymol 099 downloads.
Show the location of the four essential residues from question Q12 on the structure.
- Q14 Could the residues form an active site?
Now you have seen the power of sequence profiles in general and the PSI-BLAST program in
particular. Using sequence profiles you have been able to identify a relationship
between protein sequences far below 30% sequence similarity. Further, you have made qualified
predictions on the protein function and selected a set of essential amino acids suitable for
experimental validation of the structural and functional predictions.