HCV vaccine development

Morten Nielsen (mniel@cbs.dtu.dk) and Ole Lund (lund@cbs.dtu.dk)

Rational Vaccine design

You shall use bioinformatics tools to do a rational selection of peptides with a high potential as vaccine candidates against HCV in Guinea-Bissau. You shall tailor the selection towards the prevalent HLA-A, HLA-B, and HLA-DR molecules present in the population in Guinea-Bissau.

In detail you shall

  1. Identification of the prevalent HLA-A, HLA-B and HLA-DR alleles in Guinea-Bissau.
  2. Down-load the prevalent HCV genotype in Africa.
  3. Select a minimal set of peptides tailored to bind to the prevalent HLA types.
  4. Check if the selected peptides are cross-reactive towards other HCV subtypes.


HCV is a major cause of acute hepatitis and chronic liver disease, including cirrhosis and liver cancer. Globally, an estimated 170 million persons are chronically infected with HCV and 3 to 4 million persons are newly infected each year. HCV is spread primarily by direct contact with human blood. The major causes of HCV infection worldwide are use of unscreened blood transfusions, and re-use of needles and syringes that have not been adequately sterilized.

No vaccine is currently available to prevent hepatitis C and treatment for chronic hepatitis C is too costly for most persons in developing countries to afford. Thus, from a global perspective, the greatest impact on hepatitis C disease burden will likely be achieved by focusing efforts on reducing the risk of HCV transmission from nosocomial exposures (e.g. blood transfusions, unsafe injection practices) and high-risk behaviours (e.g. injection drug use).

Hepatitis C virus (HCV) is one of the viruses (A, B, C, D, and E), which together account for the vast majority of cases of viral hepatitis. It is an enveloped RNA virus in the flaviviridae family which appears to have a narrow host range. Humans and chimpanzees are the only known species susceptible to infection, with both species developing similar disease.

An important feature of the virus is the relative mutability of its genome, which in turn is probably related to the high propensity (80%) of inducing chronic infection. HCV is clustered into several distinct genotypes which may be important in determining the severity of the disease and the response to treatment.

For more details on HCV see link: WHO

The exercise

Identification of prevalent HLA-A, HLA-B, and HLA-DR alleles in Guinea-Bissau.

Go to the Allele Frequency database Login to the site using the username and password provided by the lecturer.

Click on HLA Allele and subsequently on HLA Allele Freq (classicals).

Specify the HLA locus to A, and "Population" to Guinea-Bissau (just Guinea-Bissau!). Leave all other options unchanged and Press Search.

Find the five most prevalent alleles. This you can do by copying the entire table content with the mouse into an excel sheet and next use the sorting function.

  • Q1 Which are the 5 most prevalent HLA-A alleles in the Guinea-Bissau population? Note that you only need to report the first four digits of the HLA typing, i.e A*0201.

    Repeat this analysis for the HLA-B and HLA-DRB1 loci.

    If the server is down you can fnd a link to the frequences here Frequency xls file.

  • Q2 Which are the 5 most prevalent HLA-B alleles in the Guinea-Bissau population?
  • Q3 Which are the 5 most prevalent HLA-DRB1 alleles in the Guinea-Bissau population?

    Identification of binding motif similarities

    Many HLA alleles share large overlaps in their binding motif preferences. You shall use the MHC motif viewer to select the three most dissimilar HLA alleles from you five previously selected motifs for each HLA-A, HLA-B, and HLA-DRB1 loci.

    Go to the MHC motif viewer web-site. This server allows for visualization of binding motifs for MHC class I loci molecules from Human, non-human primates and mouse as well as human HLA-DR class II molecules. Spend a few moments clicking around on the server.

    Go back to the MHC motif viewer home-page. Click on the MHC fight link in the top of the page.

    You can compare the binding motifs of for instance A*3301 and A*7401 by typing the allele names into the two boxes. You can get a full list of possible allele names by clicking on the MHC Exhaustive list in the upper left corner. Note, that for HLA-DRB1 alleles you must give the allele name as DRB1*XXXX.

  • Q4 Reduce the set of prevalent alleles for each HLA-A, HLA-B and HLA-DRB1 loci to three (for each loci) so that the overlap in binding specificity between the alleles in binding specificity is low. I.e remove alleles with high mutual binding overlap.

    Down-load the prevalent HCV genotype in Africa.

    You now need to identify a representative genomic sequence for the most abundant HCV genotype in the selected population. In this case it is genotype 2. We need the sequence in protein fasta format.

    Go to the HCVdb website. Under data select Genomes. Under species Hepatitis C virus, select genotype 2, and next Subgenotype 2a. Select one of the 2a genomes (for instance NC_009823). Click on the Accession link to the NCBI database. Click on the protein identifier (protein id), and select format FASTA.

    Open a text editor window, and copy-paste the amino acid sequence from the sequence window into the document, and safe it as genotype2.fsa. Remember to copy the header (the line starting with ">") also.

    Identification of CTL epitopes

    You shall use the NetMHCpan prediction-server to identify potential CTL epitopes that will bind to your prevalent HLA-A and HLA-B molecules.

    Go to the NetMHCpan prediction-server and upload the HCV genome sequence. Type in the HLA-A and HLA-B alleles you have found in question Q4 separated by commas (without blank spaces!). Note that the alleles MUST by typed as HLA-A0101 ect including the HLA nomenclature. Also it is essential that the alleles are typing in as comma separated WITHOUT blank spaces. Select Save prediction to xls file, and press Submit. The calculation might takes some minutes (>5).

    In the bottom of the results page you find a Link to output xls file. Open this file in excel.

    If the server does not come up with the answer, you can find it here Results.xls.

    The scores you get in the file are log-transformed binding affinity IC50 values. They are calculated from the IC50 values in nM units as log_IC50 = 1 - log(IC50_nM)/log(50000). This means that a log-transformed value greater than 0.426 corresponds to an IC50 values stronger than 500 nM. This value is the threshold generally taken to define binding peptides.

  • Q5 Identify 5 peptides that will give you the broadest allelic coverage. Hint, you can sort on the different alleles and average binding scores using the sorting function in excel, or define a logical function to calculate the number of alleles each peptide is binding to (prediction score > 0.426). The logical function could look like
    and other logical functions like this.

  • Q6 How many binders do you find for each of the six HLA-A and HLA-B alleles (prediction score > 0.426) using these 5 peptides?

    Peptide conservation in other HCV subtypes

    You shall now investigate if any of you selected peptides are also present in an other HCV genotype. If this is the case, the peptides might not only have high importance for vaccine development against HCV in Guinea-Bissau, but also in other parts of the world.

    Go to the HCVdb website. Under data select Genomes. Select a genotype 1a genome (for instance EF032883). Click on the Accession link to the NCBI database. Click on the protein identifier (protein id), and select format FASTA.

  • Q7 Are any of the 9mers selected in question Q6 present in the HVC genotype 1a genome?
  • Q8 What does this imply for the world wide coverage of your vaccine?

    Now you are done!! You can try to sell your peptides to big pharma can get rich.