Web resources in immunomics

The purpose of this part of the exercise is to learn which databases and resources are available on the web for immunological bioinformatics and learn how to use them.
The most comprehensive place to search for biological sequences will be the storage at NCBI, which hosts a number of databases where many are compilations of other external sites.
First we will try to get all the proteins expressed by HPV (Human papilloma virus):

Open the NCBI webpage. Search the "Nucleotide" database for "Human Papilloma virus complete genome".

  • Q1: How many hits did you get?
Click the link to the genome entry.
  • Q2: How many proteins does the virus genome encode?
Click on one of the links to the right of protein_id. Set the 'Display' dropdown window to 'Fasta', and you will now see one of the the proteins in FASTA format.
To save the FASTA file of proteins set the 'Send to' dropdown window to 'File' and select your local destination.

For diseases of special interests have been created specialized web pages with additional functions for searches.
A Special database exists for HIV.
Here we kan also search for sequences comming from patients with known HLA alleles.
However, such information has only recently become common to obtain so for older sequences this is usually not possible:

Go to the Los Alamos HIV Databases.
Select 'Sequence Database'.
Under 'Programs and Tools' select 'Search Interface'.
Under 'Virus' select 'HIV-1' and under 'Subtype' select 'B'.
Set 'Sampling year' to 2000.
Set 'Genomic region' to 'gp120'.
Check 'Include fragments of minimum length 100'
Push 'Search'.
  • Q6: How many hits did you get.
Go back and check 'Only sequences with HLA information'.
Push 'Search'.
  • Q7: How many hits did you get.
Go back and uncheck 'Only sequences with HLA information'.
Set 'Sampling year' to 2006.
Push 'Search'.
  • Q8: How many hits did you get.
Go back and check 'Only sequences with HLA information'.
Push 'Search'.
  • Q9: How many hits did you get.

Maybe also Influenza have a specialized database?
We can check this using ordinary search engines:
Go to Google. Type: 'Influenza sequence database' in the search field and execute the search.
Well, the first hit looks like something of the kind so we take a look at that.
So this is a closed website were you have to be registered but try to click the link
Influenza Virus Resource on the page.

This takes us to the NCBI Influenza Virus Resource

Now this is a very comprehensive collection of influenza sequences so lets take a look at the new Influenza type that recently jumped from swine to human and for that reason were first named 'Swine Flu'. Since the strain was actually not continously jumping from pigs to human it was later renamed using the ordinary naming convention using the serotypes of the two major antigens, Hemagglutinin (HA or H) and Neuraminidase (NA or N), of a given type. In this case the name has been set to Influenza A H1N1. However, this is not a unique name either as the 1918 spanish flu were also of the same serotype but still significantly different from the new type, which we will see using some of the tools on this website.

Now click 'Database'.
We will now take a look at the Hemagglutinin protein from recent human H1N1 types, thus we select this protein (HA).
Set H to 1 and N to 1 indicating that we search for sequences of the H1N1 serotype. Set Sample date 'From:' to 2007-01-01 (leave 'To:' blank).
Set release dat 'To:' to 2009-07-01.
Check 'Full-length sequences only'.
Press 'Add query'
  • Q10: How many sequences is found.

We now limit the total number by checking the box 'Collapse identical sequences'
  • Q11: How many sequences is found.

Press 'Show results'
We will now make a simple phylogenetic tree by pushing the 'Build a tree' button.
This takes a few minutes
Now press 'Next Step'.....'Next Step'

Now you see a tree build by comparing the sequences of the different Hemagglutinin proteins.
As you see 2007 and 2008 strains are apparrent in all branches and we do not see many (if any) 2009 types except in the upper right corner
where you see an outgroup distantly related to the other proteins.
Try to click on the larger blue dot representing many very similar sequences.
You are now taken to a list of the protein representing this small cluster.
  • Q12: How many sequences is in the cluster.
  • Q13: Is any of the sequences not from 2009 ?
Try to save this collection into a local FASTA file.

There is also databases over epitopes and related information.
The SYFPEITHI database in Tübingen is one of the oldest, still maintained,
collection of verified T cell epitopes and largest collection of natural MHC ligands (peptides presented on MHCs on the surface of a cell,
but not neccesarily is inducing an immune response). Go to the SYFPEITHI database.
Click on the small red spot to get to the main page.
The left box is a list of all the different MHC molecules foe which this database contains ligand and/or epitope information.
The first letter(s) of the name describe from which species the MHC originates, e.g., HLA- is human alleles.
  • Q14: How many species is covered?
There is, however a strong bias towards more investigated species which we will ewxemplify by a few searches.
Select the only pig example represented here (SLA haplotype d/d) and leave evertything else at default values and press 'Do Query'.
  • Q15: How many peptides is shown?
Now go back and do the search on HLA-A*0201.

You will see a number of peptides. Some will be natural ligands, i.e., the peptides between the heading 'Example for Ligand' and the heading 'T-cell epitope'.
The peptides shown after the heading 'T-cell epitope', will be peptides reported to be T cell epitopes in the litterature.
We ould like to know the number of ligands and epitopes, and one way is to copy-paste the table into a spreadsheet like excell.
  • Q16: How many Ligands are there for this allele?
  • Q17: How many epitopes is reported for this allele?

Recently a large initiative has been taken to compile all available epitope data at one place. This had lead to the Immune Epitope Database and Analysis Resource.

We will now try to find T cell epitopes to an early version of the Influenza A H1N1 that was introduced in human as the 1918 spanish Flu.

Go to the Immune Epitope Database and Analysis Resource. First we select the source organism:
Press 'Organism Finder'
In the new window press 'Influenza A virus (A/Puerto Rico/8/34(H1N1))' and then press 'Apply Selection'.
Uncheck all boxes under Immune Recognition Context except 'T Cell Response'.
Set host organism to 'Homo sapiens' (use the 'Organism Finder').
Press 'Allele finder' and type HLA-A*0201 in the 'Allele:' field and press 'saerch'.
Click on the one remaining A0201 link in the found items and 'Apply selection'.
Press 'Search'

The Number of different Peptides from the selected influenza strain that has been reported to elicit a T cell response in HLA-A*0201 hosts is given as positive peptidic epitopes.
  • Q18: How many epitopes were found?
Now try to find HLA-DRB1*0401 restricted T-helper (CD4, Class II) epitopes for the same organism.
  • Q19: How many positive epitopes were found?

Specialized epitope databases also exist for pathogens of special interests (e.g., HIV) but we will not go into details with those here.