|
Vaccine design, Epitope Atlasses
Claus Lundegaard (lunde@cbs.dtu.dk) and Morten Nielsen (mniel@cbs.dtu.dk)
Overview
During this exercise you will:
- Make atlasses showing epitopes of two genomes
- Zoom in on interesting areas
- Make proteasome predictions for a short peptide contstruct
- Try to optimise the construct manually and evaluate it using the predictions
Purpose of exercise, description of data
The purpose of this exercise is to show a visualization of the prediction tools
you have seen during the course and let you use them in a small optimization problem.
We vil use two organisms to visualize the predictions: The bacteria Richettsia prowazekii
(typhus) and Hepatitis C virus.
Change to this afternoons working directory:
cd exercise8
Richettsia prowazeki
You will use the program genewiz to make the atlasses and a number of configuration files.
Start with an atlas of the complete genome of Richettsia prowazeki, 1111523 bp. The file rpxx.epitopeatlas.cf
is the configuration file that GeneWiz will use to generate a nice looking epitope atlas. We will not go into details with this file just be sure you have it.
ls -l rpxx.epitopeatlas.cf
Now run GeneWiz:
genewiz -p rpxx_whole.ps rpxx.epitopeatlas.cf
This will generate an atlas in postscript format, rpxx_whole.ps.
To convert the file to pdf use this command:
pdfatlas rpxx_whole
Find the file with your ssh browser and drag it to your desktop and open it.
Remember the file is in two pages, and the actual atlas is at page two.
You will see a cirkular representation of the bacterial genome where the center indicates the position in kbases.
In the middle of the cirkles you see a thin grey line. This represents the DNA.
On the outer side of the thin line you see some blue blocks. These represent open reading frames (ORFs) on the + strand.
Everything on the outer side of this is predictions regarding those ORFs.
Likewise on the inner side you see red blocks representing ORFs on the - strand, and everything inside this is predictions regarding the - ORFs.
Going outside in from the very outer circle we thus have:
- Predicted strong binding subpeptides (Affinity<50 nM).
- Predicted moderate binding subpeptides (Affinity<500 nM).
- Predicted Proteasomal Cleavage
And likewise for the - ORFs further inside.
Using a modified cf file you can zoom in on a interesting area.
genewiz -p rpxx_zoom.ps rpxx.lin.epitopeatlas.cf
pdfatlas rpxx_zoom
Transfer, and open.
Now we have zoomed in on an interesting area and the resolution allows you to see the single
epitopes in the proteins encoded on the two DNA strands.
The ORFs now have names assigned, and in some ORFs you will se more than one name. This means you have overlapping ORFs.
Q1: How many named ORFs do you have transcribed from the + strand?
Q2: How many strong epitopes is predicted in the ORF RP779 encoided by the - strand?
Now we want to zoom in further on the ORF RP779.
Look carefully on your linear atlas to consider where this ORF is positioned counting in kbases
You can select another area by editing
the configuration file rpxx.lin.epitopeatlas.cf
The area are marked in the line
linearsection = 971000 999000;.
This numbering is in actual bp not in kbases. Compare the numbers with the scale on your atlas.
nedit rpxx.lin.epitopeatlas.cf &
Change the linearsection to your selected range and save the file as rpxx.lin.epitopeatlas_new.cf
genewiz -p rpxx_zoom_new.ps rpxx.lin.epitopeatlas_new.cf
pdfatlas rpxx_zoom_new
Transfer and open your new atlas.
Q3: How many intermediate binders are predicted in the ORF RP779? a) 5-9, b) 10-14 or c) 15-20
Hepatitis C Virus (HCV)
Make an HCV atlas and take a look, this atlas is much faster to make only 9609 bp.
genewiz -p HCV_lin.ps HCV.lin.epitopeatlas.cf
pdfatlas HCV_lin
Transfer and open
Q4: How many strong binders are predicted in the first 2kb region?
Now we have only one coding strand, but we have included the variability, and a
combined epitope score.
The combined score is an attempt to integrate the predictions and the conservation of the
sequence into one score. It should give a better idea of which epitopes that are the best
cadidates.
Now zoom in on the first 2kb part of the genome:
nedit HCV.lin.zoom.epitopeatlas.cf
Change the linearsection to 1 2000
Save the file
genewiz -p HCV.zoom.ps HCV.lin.zoom.epitopeatlas.cf
pdfatlas HCV.zoom
Transfer and open
Q5: How many good (strongest color) combined predictions are in this region? (Just give an estimate)
Polytope Construction
We have created a file -
preditions.tab - containing several ouputs from different prediction servers.
Take a look at the file
less predictions.tab
The format is a little unfriendly but the columns are:
1 Position in (poly)protein
2 Amino acid at this position
3 Epitope (9-mer)
4 Name of protein in original genbank file
5 Predicted HLA-A2 affinity (log50k)
6 Predicted Proteasome cleavage (raw NetChop output)
7 Predicted signal peptide (raw SignalP output)
8 Geometric average of NetChop outputs within the 9mer
9 = 1
10 Number of sequenced in variability alignment
11 = 2
12 Variability expressed as Shannon entropy (Low conserved, High variable)
We now want to select potential epitopes from these predictions.
We are using the following thresholds:
MHC-affinity: <50 nM (>0.638)
Cleavage output >0.8
Variability < 1
The following command selects epitopes according to these criteria
gawk '{ if($5>0.638 && $6>0.8 && $12<1) print($3)}' predictions.tab
and this command
will create a file with the three 9-mers separated by tabs:
gawk '{ if($5>0.638 && $6>0.8 && $12<1) printf("%s\t",$3)}END{printf("\n")}' predictions.tab > epitopes.out
Now we want to create af FASTA file with the three epitopes with linker
regions consisting of three alanines between the epitopes, and in the ends.
gawk '{print(">polytope\nAAA"$1"AAA"$2"AAA"$3"AAA\n")}' epitopes.out > polytope.fasta
Look at the file:
cat polytope.fasta
NetCTL
A recently new server is the NetCTL server.
This tool will evaluate both affinity, proteasomal cleavage, and TAP binding, resulting in a single score that can be sorted by.
Go to NetCTL.
Paste in your construct in the sequence window. Select the A2 supertype, and sort by combined score.
Evaluate the results:
Q6: Will there be cleavage at the C-terminals of the
epitopes?
Q7: Is there any new fusion epitopes?
Push the back button and edit in the window the linker regions to optimize
the polytope.
Do a new prediction and evaluate.
Continue untill a satisfactory result is obtained (this might not be
possible).
Another tool is a polytope optimizer, that will generate the optimal linkers between epitopes. However, polytope is an older
tool not considering TAP binding, and to minimize the calculation time only one NN is used for each of the cleavage and the binding predictions.
Furthermore these NNs are only trained using sparse encoding.
We have made two files in the format used by the polytope program. poly1.list and poly2.list.
cat poly1.list
cat poly2.list
As you can see the epitopes are the same, but the order and linkers are different. Write down the sequences of the three epitopes for later use.
First we want to see which of these polytope that have the best construction. Right click on NetCTL to open a fresh server window. To get the polytope in copy-paste ready FASTA format use the following command:
cat poly1.list | grep -v # | gawk 'BEGIN {print">polytope_1"} {printf("%s",$2)} END {printf("\n")}'
Paste the polytope_1 header and sequence into the server window. Set the supertype to A2. Set the sort by score to sort by combined score. Set the weight on TAP to 0 (as the polytope program we use later do not take TAP into account). Run the prediction.
Identify your initial epitopes in the output.
Q8: What are the rankings and combined scores for your epitopes?
Do the same with the poly2.list.
Q9: What are the rankings and combined scores for your epitopes?
Q10: Why are the rankings different?
Now we would like to try to optimize one of the polytopes.
polytope_cont3 poly1.list > poly1.out
cat poly1.out
As you see, a lot of middle calculations are going to the screen, but finally it will end with a suggested new optimized polytope. If you scroll up a little you will see the polytope in a column format were the first column is the number of the residue, and the second column is the actual residue. Linker residues will be in lowercase. At the C-terminal of the epitopes is indicated the predicted strength of cleavage, and if there is any predicted cleavage sites inside the epitopes, it is indicated with a CWITHIN marker. Note that the actual prediction values for MHC binding and cleavage
differ from those of NetCTL, since the predictions methods used in the polytope program are reduced versions
of the NetCTL methods.
The sequence of the final polytope is also given on a line starting with:
"Best Energy"
We can now make Atlases of our initial polytope and our final polytope to compare the two states.
For the initial polytope do:
plotatlas -b -el poly1.list poly1.out
pdfatlas poly
mv poly.pdf poly.init.pdf
And for the final:
plotatlas -el poly1.list poly1.out
pdfatlas poly
mv poly.pdf poly.final.pdf
Now transfer the two pdf atlases to your PC and open them.
Look at the initial polytope atlas
The upper green panel is binding predictions. A green shade meens that this is the first position in a binding 9-mer. No shade is no binding. Weak shade is moderate binding. Dark green is strong binding
Q10: Are there any predicted binders other than the 3 input epitopes?
The middle panel indicates a predicted proteasomal cleavage at that position.
Q11: Do any of the three epitopes have a poor C terminal cleavage?
Look at the final atlas.
Q12: Are there any predicted binders other than the 3 input epitopes?
Q13: Do any of the three epitopes have a poor C terminal cleavage?
Done!
|