
There are 5 questions (Q1-Q5) in this exercise that you need to answer. The
people that are physically present can hand in a form that you will be given at
the start of the exercise.
The "online-students" must email the answers to tnp@cbs.dtu.dk and
write the PhD number in the subject field. Question Q2) is optional for the
"online-students".
The purpose of this exercise is to build 3D-models from protein sequences by
use of homology modeling. Normally you want to see the 3D-structure of a
protein that is of interest to you. A predicted model of a protein sequence
might guide or just help you to interpret or plan experiments in the lab.
1) Are the terminal ends burried/exposed - Attatch a His-tag ?
2) What residues define the active site ?
3) Can I use the model to design a small molecule drug ?
4) Can mutagenisis inactivate/optimize the function of the protein
Several programs are avaliable but here three different modeling programs will
be used. Secondly, the quality of the 3D-models are analyzed and finally a
comparison can be made to the true protein structures. The two sequences below
resemble an easy and a difficult homology modeling task.
The first protein is dihydroorotate dehydrogenase from common rat. This protein
is involved in the pyrimidine biosynthesis, ubiquous in all organisms.
The second enzyme is employed by microorganisms during plant degradation (or
"soft-rot").
There were no experimental 3D structure at the time when the sequences were
submitted to the three modeling-servers below. By now however the rat 3D
structures is known as 1UUM
where as the pectate lyase has not yet been submitted to the PDB and it therefore still
remains as a difficult target.
>RAT Dihydroorotate dehydrogenase [Rattus norvegicus]
MAWRQLRKRALDAVIILGGGGLLFTSYLTATGDDHFYAEYLMPGLQRLLDPESAHRLAVR
VTSLGLLPRATFQDSDMLEVKVLGHKFRNPVGIAAGFDKNGEAVDGLYKLGFGFVEVGSV
TPQPQEGNPRPRVFRLPEDQAVINRYGFNSHGLSVVEHRLRARQQKQAQLTADGLPLGIN
LGKNKTSEDAAADYAEGVRTLGPLADYLVVNVSSPNTAGLRSLQGKTELRHLLSKVLQER
DALKGTRKPAVLVKIAPDLTAQDKEDIASVARELGIDGLIVTNTTVSRPVGLQGALRSET
GGLSGKPLRDLSTQTIREMYALTQGRIPIIGVGGVSSGQDALEKIQAGASLVQLYTALIF
LGPPVVVRVKRELEALLKERGFTTVTDAIGADHRR
>PE Pectate lyase [Thermotoga maritima]
SLNDKPVGFASVPTADLPEGTVGGLGGEIVFVRTAEELEKYTTAEGKYVIVVDGTIVFEP
KREIKVLSDKTIVGINDAKIVGGGLVIKDAQNVIIRNIHFEGFYMEDDPRGKKYDFDYIN
VENSHHIWIDHCTFVNGNDGAVDIKKYSNYITVSWCKFVDHDKVSLVGSSDKEDPEQAGQ
AYKVTYHHNYFKNCIQRMPRIRFGMAHVFNNFYSMGLRTGVSGNVFPIYGVASAMGAKVH
VEGNYFMGYGAVMAEAGIAFLPTRIMGPVEGYLTLGEGDAKNEFYYCKEPEVRPVEEGKP
ALDPREYYDYTLDPVQDVPKIVVDGAGAGKLVFEELNTAQ
1) Find one or
more suitable template(s) with known structures
2) Align the query sequence of interest with the
templates
3) Thread the query sequence onto the template
structure
4) Energy minimisation
Initial Questions |
Q1) What are the names of the four backbone atoms in a
protein ?
Q2) Draw a di-peptide and indicate the sidechain with "R"
Validating your alignment and
submitting a modeling request
Fill in the empty fields below.
Alignment technology in short utilized by CPHmodels:
Query sequence is blasted against the PDB database. If a PDB hit is found then a)
else b)
a) Align Query sequence and PDB sequence by use of a Blosum62 matrix.
b) Query sequence is blasted against SwissProt/nr database and a
sequence profile is generated. Now search the PDB database with the query
sequence profile. If still no PDB hit is found then search one more iteration
agains SwissProt (Continue 3-4 iterations). If a PDB hit is found then make a sequence
profile and align the query and target sequences by use of a profile-profile
alignment.
NB ! A residue is an amino acid.
The output from homology modeling servers vary much eg the information about
target sequence used, sequence alignment and number of residues in the model. In
fact four residues (60, 104-106) are missing in the pe_2 model produced by the
3Djigsaw.
The homology modeling servers have chosen different templates to make the
'pe-models'. SwissModel used 1QCX:A and CPHmodels used 1BN8:A. This is due to
different search algorithms used by the two programs. To explore how similar
the templates are the CE-alignment program can be used to verify
sequence/structure similarities, based on a structural super-positioning.
Compare the structure of two templates by use of the structural alignment
program CE.
After having aligned the two sequences within the CE-program, do this: "Press
to start Compare3D".
|
Template
SwissModel 1QCX:A |
Template
CPHmodels 1BN8:A |
Rmsd |
% id |
Checking Model
quality:
Click here for
more information on accuracy determination. A good model has > 90% of
the residues in core region of a Rahachandran plot and >98% in the
core+allowed region.
Q3) Which atoms define the phi and psi dihedral angles ? (Search the internet).
|
Ramachandran
summary file |
Ramachandran
plot |
Core+allowed (%) |
Disallowed (%) |
|
|
|||
To get a more quantitative measure
for the alignments, you can calculate the root mean square deviation (rmsd)
between the true structure (chain 1) and the model (chain 2) by use of the CE server.
The output from the CE-server is shown below. The term 'alignment length' is
used as a measure of how many residues that have been aligned to another
residue eg. alignment to gaps are excluded.Therefore be aware that %id and
%gaps do not need to add up to 100%.
|
Ce
alignment |
Ce align length |
CE rmsd (Å) |
% id |
# of correct residues in model |
|
171 |
|
|||
|
169 |
|
|||
|
|
171 |
|
||
|
48 |
|
|||
|
186 |
|
|||
|
168 |
|
The paper describing the ProQ method can be seen here: "Can
correct models be identified"
Use the ProQ predictor
to verify the correctness of the six models.
Q4) Which of the modeling
server(s) made the most reliable rat/pe models based on the Ramachandran
analysis and the ProQ predictions ?
Q5) In the "correctness of models" the true 3D-structures of
"rat" and "pe" were known. What is the ranking of the modeling-servers
based on that analysis.
It is often possible to build a model by use of a modeling-server. The fact that a model can be built is no guarantee that it is correct. Try out several modeling-servers and chose the model that seems best with respect to Ramachandran quality and ProQ scores. The sequence alignment is the most critical step in the modeling process.