Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

HCV vaccine development

Claus Lundegaard (lunde@cbs.dtu.dk)


Rational Vaccine design

You shall use bioinformatics tools to do a rational selection of peptides with a high potential as vaccine candidates against HCV in Guinea-Bissau. You shall tailor the selection towards the prevalent HLA-A, HLA-B, and HLA-DR molecules present in the population in Guinea-Bissau.

In detail you shall

  1. Identification of the prevalent HLA-A, HLA-B and HLA-DR alleles in Guinea-Bissau.
  2. Down-load the prevalent HCV genotype in Africa.
  3. Select a minimal set of peptides tailored to bind to the prevalent HLA types.
  4. Check if the selected peptides are cross-reactive towards other HCV subtypes.

Background

HCV is a major cause of acute hepatitis and chronic liver disease, including cirrhosis and liver cancer. Globally, an estimated 170 million persons are chronically infected with HCV and 3 to 4 million persons are newly infected each year. HCV is spread primarily by direct contact with human blood. The major causes of HCV infection worldwide are use of unscreened blood transfusions, and re-use of needles and syringes that have not been adequately sterilized.

No vaccine is currently available to prevent hepatitis C and treatment for chronic hepatitis C is too costly for most persons in developing countries to afford. Thus, from a global perspective, the greatest impact on hepatitis C disease burden will likely be achieved by focusing efforts on reducing the risk of HCV transmission from nosocomial exposures (e.g. blood transfusions, unsafe injection practices) and high-risk behaviours (e.g. injection drug use).

Hepatitis C virus (HCV) is one of the viruses (A, B, C, D, and E), which together account for the vast majority of cases of viral hepatitis. It is an enveloped RNA virus in the flaviviridae family which appears to have a narrow host range. Humans and chimpanzees are the only known species susceptible to infection, with both species developing similar disease.

An important feature of the virus is the relative mutability of its genome, which in turn is probably related to the high propensity (80%) of inducing chronic infection. HCV is clustered into several distinct genotypes which may be important in determining the severity of the disease and the response to treatment.

For more details on HCV see link: WHO


The exercise

Identification of prevalent HLA-A, HLA-B, and HLA-DR alleles in Guinea-Bissau.

Go to the Allele Frequency database

From the left hand menu select "HLA database" submenu "HLA Allele Frequency Search".

Specify the HLA locus to A.
Set Country to Guinea-Bissau.
Set Population to Guinea-Bissau.
Set starting allele to A*01:01 Set "Ending Allele" to the last HLA A* allele in the list (it is a long list!!).
Set "Sort By:" to Population, highest to lowest Frequency.
set "Level of resolution" to >=2

Leave other options to the default and press the search button.

  • Q1 Which are the 5 most prevalent HLA-A alleles in Guinea-Bissau? Note that you only need to report the first four digits of the HLA typing, i.e A*02:01.

    Repeat this analysis for the HLA-B and HLA-DRB1 loci.

  • Q2 Which are the 5 most prevalent HLA-B alleles in Guinea-Bissau?
  • Q3 Which are the 5 most prevalent HLA-DRB1 alleles in Guinea-Bissau?

    Identification of binding motif similarities

    Many HLA alleles share large overlaps in their binding motif preferences. You shall use the MHC motif viewer to select the three most dissimilar HLA alleles from you five previously selected motifs for each HLA-A, HLA-B, and HLA-DRB1 loci.

    Go to the MHC motif viewer web-site. This server allows for visualization of binding motifs for MHC class I loci molecules from Human, non-human primates and mouse as well as human HLA-DR class II molecules. Spend a few moments clicking around on the server.

    Go back to the MHC motif viewer home-page. Click on the MHC fight link in the top of the page.

    You can compare the binding motifs of for instance A*3301 and A*7401 by typing the allele names into the two boxes. You can get a full list of possible allele names by clicking on the MHC Exhaustive list in the upper left corner.

  • Q4 Reduce the set of prevalent alleles for each HLA-A, HLA-B and HLA-DR loci so that the overlap in binding specificity between the alleles in binding specificity is low.
    Remove 2 alleles from each loci.

    Down-load the prevalent HCV genotype in Africa.

    You now need to identify a representative genomic sequence for the most abundant HCV genotype in the selected population. In this case it is genotype 2. We need the sequence in protein fasta format.

    Go to GenBank.
    Set the 'search' roll-down to Genome and enter "Hepatitis C virus genotype 2" into the search field.
    Click on the first entry number (NC_009823).
    Set Display to Protein FASTA and keep the window open.


    Identification of CTL epitopes

    You shall use the NetMHCpan prediction-server to identify potential CTL epitopes that will bind to your prevalent HLA-A and HLA-B molecules.

    Go to the NetMHCpan prediction-server and copy-paste the FASTA HCV genotype 2 genome protein sequence. Type in the HLA-A and HLA-B alleles you have found in question Q4 separated by commas (without blank spaces!), select Save prediction to xls file, and press Submit. The calculation might takes some minutes.

    In the bottom of the results page you find a Link to output xls file. Open this file in excel.

    The scores you get in the file are log-transformed binding affinity IC50 values. They are calculated from the IC50 values in nM units as log_IC50 = 1 - (log(IC50_nM)/log(50000)). This means that a log-transformed value greater than 0.426 corresponds to an IC50 values stronger than 500 nM. This value is the threshold generally taken to define binding peptides.

    MS Excell notes:
    If you use the danish version of Excell you will first need to exchange all '.' (periods) with ',' (commas).
    In the danish version the function INT() is named HELTAL().

    You can score the number of alleles a given peptide will bind to using the the expression INT(cell_number>0.426) (0,426 in danish Excell) and then sum over all alleles.

  • Q5 Identify 5 peptides that will give you the broadest allelic coverage.
  • Q6 How many binders do you find for each allele (prediction score > 0.426)?

    Identification of T-helper epitopes

    To identify potential T-helper epitopes you shall use the NetMHCIIpan prediction-server.

    Go to the NetMHCIIpan prediction-server and upload the HCV genome sequence. Type in the three HLA-DRB1 alleles you have found in question Q4 separated by commas (without blank spaces!), select Save prediction to xls file, and Use fast mode (recommended for large calculations) (this will make slightly less accurate predictions but run 10 times faster), and press Submit. The calculation might takes some minutes.

    In the bottom of the results page you find a Link to output xls file. Open this file in excel.

    Here the output you get is a bit more complex. The file contains prediction for all 15mer peptides in your input FASTA file. The columns in the file are

    1. Peptide number
    2. Peptide sequence
    3. Core position in first allele
    4. Core peptide in first allele
    5. Log-transformed binding score of peptide to first allele
    6. Core position in second allele
    7. ...
    8. Average log-transformed binding score over all alleles in the prediction

  • Q7 Are any of the CTL epitopes you found in question Q5 part of a 15mer T-helper epitope for the three HLA-DRB1 alleles?

  • Q8 Select the smallest number of 15mer peptides so that these peptides will have coverage of all HLA-A, HLA-B, and HLA-DRB1 alleles in your prevalent selection

    Peptide conservation in other HCV subtypes

    You shall now investigate if any of you selected peptides are also present in an other HCV genotype. If this is the case, the peptides might not only have high importance for vaccine development against HCV in Guinea-Bissau, but also in other parts of the world.

    Go to GenBank.
    Set the 'search' roll-down to Genome and enter "Hepatitis C virus genotype 1" into the search field.
    Click on the first entry number (NC_004102).
    Set Display to Protein FASTA and keep the window open.

  • Q9 Are any of the 15mers selected in question Q7 present in the HVC genotype 1 genome?
    Hint: Copy-paste the translated sequence into an empty word document and remove all paragraphs and white-spaces. (replace with nothing). Then search for the 15mer sequence in the document.
  • Q10 What does this imply for the world wide coverage of your vaccine?

    Now you are done!! You can try to sell your peptides to big pharma can get rich.