Homology Modeling

Important !  


There are 5 questions (Q1-Q5) in this exercise that you need to answer. The people that are physically present can hand in a form that you will be given at the start of the exercise.
The "online-students" must email the answers to tnp@cbs.dtu.dk and write the PhD number in the subject field. Question Q2) is optional for the "online-students".



The purpose of this exercise is to build 3D-models from protein sequences by use of homology modeling. Normally you want to see the 3D-structure of a protein that is of interest to you. A predicted model of a protein sequence might guide or just help you to interpret or plan experiments in the lab.

1) Are the terminal ends burried/exposed - Attatch a His-tag ?
2) What residues define the active site ?
3) Can I use the model to design a small molecule drug ?
4) Can mutagenisis inactivate/optimize the function of the protein

Several programs are avaliable but here three different modeling programs will be used. Secondly, the quality of the 3D-models are analyzed and finally a comparison can be made to the true protein structures. The two sequences below resemble an easy and a difficult homology modeling task.
The first protein is dihydroorotate dehydrogenase from common rat. This protein is involved in the pyrimidine biosynthesis, ubiquous in all organisms.
The second enzyme is employed by microorganisms during plant degradation (or "soft-rot").
There were no experimental 3D structure at the time when the sequences were submitted to the three modeling-servers below. By now however the rat 3D structures is known as 1UUM where as the pectate lyase has not yet been submitted to the PDB and it therefore still remains as a difficult target.  

>RAT Dihydroorotate dehydrogenase [Rattus norvegicus]
MAWRQLRKRALDAVIILGGGGLLFTSYLTATGDDHFYAEYLMPGLQRLLDPESAHRLAVR
VTSLGLLPRATFQDSDMLEVKVLGHKFRNPVGIAAGFDKNGEAVDGLYKLGFGFVEVGSV
TPQPQEGNPRPRVFRLPEDQAVINRYGFNSHGLSVVEHRLRARQQKQAQLTADGLPLGIN
LGKNKTSEDAAADYAEGVRTLGPLADYLVVNVSSPNTAGLRSLQGKTELRHLLSKVLQER
DALKGTRKPAVLVKIAPDLTAQDKEDIASVARELGIDGLIVTNTTVSRPVGLQGALRSET
GGLSGKPLRDLSTQTIREMYALTQGRIPIIGVGGVSSGQDALEKIQAGASLVQLYTALIF
LGPPVVVRVKRELEALLKERGFTTVTDAIGADHRR

>PE Pectate lyase [Thermotoga maritima]
SLNDKPVGFASVPTADLPEGTVGGLGGEIVFVRTAEELEKYTTAEGKYVIVVDGTIVFEP
KREIKVLSDKTIVGINDAKIVGGGLVIKDAQNVIIRNIHFEGFYMEDDPRGKKYDFDYIN
VENSHHIWIDHCTFVNGNDGAVDIKKYSNYITVSWCKFVDHDKVSLVGSSDKEDPEQAGQ
AYKVTYHHNYFKNCIQRMPRIRFGMAHVFNNFYSMGLRTGVSGNVFPIYGVASAMGAKVH
VEGNYFMGYGAVMAEAGIAFLPTRIMGPVEGYLTLGEGDAKNEFYYCKEPEVRPVEEGKP
ALDPREYYDYTLDPVQDVPKIVVDGAGAGKLVFEELNTAQ


The concept of modeling is:

1) Find one or more suitable template(s) with known structures
2) Align the query sequence of interest with the templates
3) Thread the query sequence onto the template structure
4) Energy minimisation


Initial Questions

 

Q1) What are the names of the four backbone atoms in a protein ?
Q2) Draw a di-peptide and indicate the sidechain with "R"


Validating your alignment and submitting a modeling request

The two query sequences have been submittet to three homology modeling servers and the resulting pdb models are linked below. Pressing the link to the pdb.file enables you to see the 3D-structure, whereas the file.txt contains the text output from the modeling server. You may (optional) submit the PE-sequence to the local modeling server CPHmodels. This will take approx. 3 minutes.
Fill in the empty fields below.

Alignment technology in short utilized by CPHmodels:
Query sequence is blasted against the PDB database. If a PDB hit is found then a) else b)
a) Align Query sequence and PDB sequence by use of a Blosum62 matrix.
b) Query sequence is blasted against SwissProt/nr database and a sequence profile is generated. Now search the PDB database with the query sequence profile. If still no PDB hit is found then search one more iteration agains SwissProt (Continue 3-4 iterations). If a PDB hit is found then make a sequence profile and align the query and target sequences by use of a profile-profile alignment.
 
NB ! A residue is an amino acid.  
 

Modeling servers

rat models

Residues in model

pe models

Residues in model

SwissModel

rat_1.pdb

rat_1.txt

pe_1.pdb

pe_1.txt

3Djigsaw

rat_2.pdb

rat_2.txt

pe_2.pdb

pe_2.txt

CPHmodels

rat_3.pdb

rat_3.txt

pe_3.pdb

pe_3.txt

The output from homology modeling servers vary much eg the information about target sequence used, sequence alignment and number of residues in the model. In fact four residues (60, 104-106) are missing in the pe_2 model produced by the 3Djigsaw.

The homology modeling servers have chosen different templates to make the 'pe-models'. SwissModel used 1QCX:A and CPHmodels used 1BN8:A. This is due to different search algorithms used by the two programs. To explore how similar the templates are the CE-alignment program can be used to verify sequence/structure similarities, based on a structural super-positioning.
Compare the structure of two templates by use of the structural alignment program CE.
After having aligned the two sequences within the CE-program, do this: "Press to start Compare3D".

CE server

Template SwissModel 1QCX:A

Template CPHmodels 1BN8:A

Rmsd

% id

 

Checking Model quality:
A handy tool for locating areas causing trouble is the "Ramachandran Plot". A Ramachandran plot visualises the torsion angles of the peptide backbone and almost all residues of natural proteins are found within the areas denoted on the plot. An exception is glycine and proline residues. Problems with the peptide backbone are quickly spotted with this tool as seen in "A bad Ramachandran plot".
Click here for more information on accuracy determination. A good model has > 90% of the residues in core region of a Rahachandran plot and >98% in the core+allowed region.
Q3) Which atoms define the phi and psi dihedral angles ? (Search the internet).

Ramachandran plot:
 

Ramachandran summary file

Ramachandran plot

Core+allowed (%)

Disallowed (%)

rat_1

rat_1_rama.pdf

 

rat_2

rat_2_rama.pdf

rat_3

rat_3_rama.pdf

pe_1

pe_1_rama.pdf

pe_2

pe_2_rama.pdf

pe_3

pe_3_rama.pdf


To get a more quantitative measure for the alignments, you can calculate the root mean square deviation (rmsd) between the true structure (chain 1) and the model (chain 2) by use of the CE server. The output from the CE-server is shown below. The term 'alignment length' is used as a measure of how many residues that have been aligned to another residue eg. alignment to gaps are excluded.Therefore be aware that %id and %gaps do not need to add up to 100%.

Correctness of the models:
 

Ce alignment

Ce align length

CE rmsd (Å)

% id

# of correct residues in model

rat_1

171

0.84

rat_2

169

1.25

rat_3

 

171

0.81

pe_1

48

1.17

pe_2

186

2.86

pe_3

168

1.85

 


Predicting model quality by use of ProQ:  



The paper describing the ProQ method can be seen here: "Can correct models be identified"
Use the ProQ predictor to verify the correctness of the six models.

Modeling server

rat models

LGscore

pe models

LGscore

SwissModel

rat_1.txt

pe_1.txt

3Djigsaw

rat_2.txt

pe_2.txt

CPHmodels

rat_3.txt

pe_3.txt

Ranking the modeling servers 1 - 3, where 1 is best:  

Q4) Which of the modeling server(s) made the most reliable rat/pe models based on the Ramachandran analysis and the ProQ predictions ?
Q5) In the "correctness of models" the true 3D-structures of "rat" and "pe" were known. What is the ranking of the modeling-servers based on that analysis.

Key points:  

It is often possible to build a model by use of a modeling-server. The fact that a model can be built is no guarantee that it is correct. Try out several modeling-servers and chose the model that seems best with respect to Ramachandran quality and ProQ scores. The sequence alignment is the most critical step in the modeling process.