Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Vaccine design, Epitope Atlasses

Claus Lundegaard (lunde@cbs.dtu.dk) and Morten Nielsen (mniel@cbs.dtu.dk)


Overview

During this exercise you will:

  1. Make atlasses showing epitopes of two genomes
  2. Zoom in on interesting areas
  3. Make proteasome predictions for a short peptide contstruct
  4. Try to optimise the construct manually and evaluate it using the predictions


Purpose of exercise, description of data

The purpose of this exercise is to show a visualization of the prediction tools you have seen during the course and let you use them in a small optimization problem. We vil use two organisms to visualize the predictions: The bacteria Richettsia prowazekii (typhus) and Hepatitis C virus.

Change to this afternoons working directory:

cd exercise8

Richettsia prowazeki

You will use the program genewiz to make the atlasses and a number of configuration files. Start with an atlas of the complete genome of Richettsia prowazeki, 1111523 bp. The file rpxx.epitopeatlas.cf is the configuration file that GeneWiz will use to generate a nice looking epitope atlas. We will not go into details with this file just be sure you have it.

ls -l rpxx.epitopeatlas.cf

Now run GeneWiz:

genewiz -p rpxx_whole.ps rpxx.epitopeatlas.cf


This will generate an atlas in postscript format, rpxx_whole.ps.
To convert the file to pdf use this command:
pdfatlas rpxx_whole

Find the file with your ssh browser and drag it to your desktop and open it.
Remember the file is in two pages, and the actual atlas is at page two.

You will see a cirkular representation of the bacterial genome where the center indicates the position in kbases.
In the middle of the cirkles you see a thin grey line. This represents the DNA.
On the outer side of the thin line you see some blue blocks. These represent open reading frames (ORFs) on the + strand.
Everything on the outer side of this is predictions regarding those ORFs.
Likewise on the inner side you see red blocks representing ORFs on the - strand, and everything inside this is predictions regarding the - ORFs.

Going outside in from the very outer circle we thus have:
  1. Predicted strong binding subpeptides (Affinity<50 nM).
  2. Predicted moderate binding subpeptides (Affinity<500 nM).
  3. Predicted Proteasomal Cleavage
And likewise for the - ORFs further inside.

Using a modified cf file you can zoom in on a interesting area.

genewiz -p rpxx_zoom.ps rpxx.lin.epitopeatlas.cf

pdfatlas rpxx_zoom

Transfer, and open.

Now we have zoomed in on an interesting area and the resolution allows you to see the single epitopes in the proteins encoded on the two DNA strands.
The ORFs now have names assigned, and in some ORFs you will se more than one name. This means you have overlapping ORFs.

Q1: How many named ORFs do you have transcribed from the + strand?
Q2: How many strong epitopes is predicted in the ORF RP779 encoided by the - strand?

Now we want to zoom in further on the ORF RP779.
Look carefully on your linear atlas to consider where this ORF is positioned counting in kbases
You can select another area by editing the configuration file rpxx.lin.epitopeatlas.cf The area are marked in the line linearsection = 971000 999000;.

This numbering is in actual bp not in kbases. Compare the numbers with the scale on your atlas.

nedit rpxx.lin.epitopeatlas.cf &

Change the linearsection to your selected range and save the file as rpxx.lin.epitopeatlas_new.cf

genewiz -p rpxx_zoom_new.ps rpxx.lin.epitopeatlas_new.cf

pdfatlas rpxx_zoom_new

Transfer and open your new atlas.

Q3: How many intermediate binders are predicted in the ORF RP779? a) 5-9, b) 10-14 or c) 15-20


Hepatitis C Virus (HCV)

Make an HCV atlas and take a look, this atlas is much faster to make only 9609 bp.

genewiz -p HCV_lin.ps HCV.lin.epitopeatlas.cf

pdfatlas HCV_lin 

Transfer and open

Q4: How many strong binders are predicted in the first 2kb region?

Now we have only one coding strand, but we have included the variability, and a combined epitope score. The combined score is an attempt to integrate the predictions and the conservation of the sequence into one score. It should give a better idea of which epitopes that are the best cadidates.
Now zoom in on the first 2kb part of the genome:

nedit HCV.lin.zoom.epitopeatlas.cf

Change the linearsection to 1 2000
Save the file

genewiz -p HCV.zoom.ps HCV.lin.zoom.epitopeatlas.cf

pdfatlas HCV.zoom

Transfer and open

Q5: How many good (strongest color) combined predictions are in this region? (Just give an estimate)


Polytope Construction

We have created a file - preditions.tab - containing several ouputs from different prediction servers.

Take a look at the file

less predictions.tab

The format is a little unfriendly but the columns are:

1 Position in (poly)protein 
2 Amino acid at this position
3 Epitope (9-mer)
4 Name of protein in original genbank file
5 Predicted HLA-A2 affinity (log50k)
6 Predicted Proteasome cleavage (raw NetChop output)
7 Predicted signal peptide (raw SignalP output)
8 Geometric average of NetChop outputs within the 9mer
9 = 1
10  Number of sequenced in variability alignment
11 = 2
12 Variability expressed as Shannon entropy (Low conserved, High variable)

We now want to select potential epitopes from these predictions.

We are using the following thresholds:

MHC-affinity:    <50 nM  (>0.638)
Cleavage output  >0.8
Variability  < 1

The following command selects epitopes according to these criteria

gawk '{ if($5>0.638 && $6>0.8 && $12<1) print($3)}'  predictions.tab 

and this command will create a file with the three 9-mers separated by tabs:

gawk '{ if($5>0.638 && $6>0.8 && $12<1) printf("%s\t",$3)}END{printf("\n")}'  predictions.tab > epitopes.out

Now we want to create af FASTA file with the three epitopes with linker regions consisting of three alanines between the epitopes, and in the ends.

gawk '{print(">polytope\nAAA"$1"AAA"$2"AAA"$3"AAA\n")}'  epitopes.out > polytope.fasta

Look at the file:

cat polytope.fasta



NetCTL

A recently new server is the NetCTL server. This tool will evaluate both affinity, proteasomal cleavage, and TAP binding, resulting in a single score that can be sorted by.
Go to NetCTL.

Paste in your construct in the sequence window. Select the A2 supertype, and sort by combined score.

Evaluate the results:

Q6: Will there be cleavage at the C-terminals of the epitopes?
Q7: Is there any new fusion epitopes?

Push the back button and edit in the window the linker regions to optimize the polytope.

Do a new prediction and evaluate.

Continue untill a satisfactory result is obtained (this might not be possible).

Another tool is a polytope optimizer, that will generate the optimal linkers between epitopes. However, polytope is an older tool not considering TAP binding, and to minimize the calculation time only one NN is used for each of the cleavage and the binding predictions. Furthermore these NNs are only trained using sparse encoding.

We have made two files in the format used by the polytope program. poly1.list and poly2.list.

cat poly1.list

cat poly2.list

As you can see the epitopes are the same, but the order and linkers are different. Write down the sequences of the three epitopes for later use.

First we want to see which of these polytope that have the best construction. Right click on NetCTL to open a fresh server window. To get the polytope in copy-paste ready FASTA format use the following command:

cat poly1.list | grep -v # | gawk 'BEGIN {print">polytope_1"} {printf("%s",$2)} END {printf("\n")}'

Paste the polytope_1 header and sequence into the server window. Set the supertype to A2. Set the sort by score to sort by combined score. Set the weight on TAP to 0 (as the polytope program we use later do not take TAP into account). Run the prediction.
Identify your initial epitopes in the output.

Q8: What are the rankings and combined scores for your epitopes?
Do the same with the poly2.list.

Q9: What are the rankings and combined scores for your epitopes?
Q10: Why are the rankings different?

Now we would like to try to optimize one of the polytopes.

polytope_cont3 poly1.list > poly1.out


cat poly1.out

As you see, a lot of middle calculations are going to the screen, but finally it will end with a suggested new optimized polytope. If you scroll up a little you will see the polytope in a column format were the first column is the number of the residue, and the second column is the actual residue. Linker residues will be in lowercase. At the C-terminal of the epitopes is indicated the predicted strength of cleavage, and if there is any predicted cleavage sites inside the epitopes, it is indicated with a CWITHIN marker. Note that the actual prediction values for MHC binding and cleavage differ from those of NetCTL, since the predictions methods used in the polytope program are reduced versions of the NetCTL methods.

The sequence of the final polytope is also given on a line starting with:

"Best Energy" We can now make Atlases of our initial polytope and our final polytope to compare the two states.
For the initial polytope do:

plotatlas -b -el poly1.list poly1.out

pdfatlas poly

mv poly.pdf poly.init.pdf


And for the final:

plotatlas -el poly1.list poly1.out

pdfatlas poly

mv poly.pdf poly.final.pdf


Now transfer the two pdf atlases to your PC and open them.

Look at the initial polytope atlas
The upper green panel is binding predictions. A green shade meens that this is the first position in a binding 9-mer. No shade is no binding. Weak shade is moderate binding. Dark green is strong binding

Q10: Are there any predicted binders other than the 3 input epitopes?

The middle panel indicates a predicted proteasomal cleavage at that position.

Q11: Do any of the three epitopes have a poor C terminal cleavage?

Look at the final atlas.

Q12: Are there any predicted binders other than the 3 input epitopes?
Q13: Do any of the three epitopes have a poor C terminal cleavage?

Done!