Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Supplementary material

NetMHCpan - MHC class I binding prediction beyond humans

Here, you will find the data set used for evaluation in the above paper. The data falls in five parts a) Non-human primates, b) HLA-A and HLA-B ligand c) HLA-E ligands d) SYFPEITHI HLA-C ligands, and e) SYFPEITHI HLA-G ligands.

a) Non-human primates. The format for data is

Allele  Peptide log50k
Mamu-A01 YPPMMCYFL 1.0
Mamu-A01 NSPLHCYTM 1.0
Mamu-A01 ITPQPVPTA 0.482918
Mamu-A01 LTPIFSDLL 0.790002
Mamu-A01 GSPTNLEFI 0.634812
Mamu-A01 DSPHYVPIL 0.682619
Mamu-A01 TLPELNLSL 0.787187
Mamu-A01 ASPRIGDQL 0.945674
Mamu-A01 FSPFKLNLI 1.0
Mamu-A01 MIPLLFILF 0.911688

where the first column gives the allele, and the second column gives the peptide and the last column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).

When classifying the peptides into binders and non-binders, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

b) HLA-A and HLA-B ligands. The file contains 596 HLA-A and HLA-B ligands downloaded from the SYFPEITHI database. The FASTA header for each entry has the format

>uniprot|A8KA43 227 KRFGKAYNL B2705

where the first column is the protein identifier, the second column is the location of the HLA ligand in the protein sequence, the third column is the HLA ligand, and the last column is the HLA restriction.

c) HLA-E ligands. The file contains seven HLA-E ligands downloaded from the IEDB database. All ligands are frm the same source protein, and the file contains all 9mer peptides form the source protein (P0A1D4) with the ligands annotated with the value 1 and all other peptides with the value 0. The format of the data is

MAAKDVKFG 0
AAKDVKFGN 0
AKDVKFGND 0
KDVKFGNDA 0
DVKFGNDAR 0
VKFGNDARV 0
KFGNDARVK 0
FGNDARVKM 0
GNDARVKML 0
NDARVKMLR 0

d) HLA-C ligands. The file contains the source proteins for 77 HLA-C ligands from the SYFPEITHI database in FASTA format. The FASTA header for each entry has the format

>gnl|BL_ORD_ID|54508    244     FAPYNKPSL Cw0102

where the first column is the protein identifier, the second column is the location of the HLA ligand in the protein sequence, the third column is the HLA ligand, and the last column is the HLA restriction.

e) HLA-G (HLA-G*0101) ligands. The file contains the source proteins for 11 HLA-G ligands from the SYFPEITHI database in FASTA format. The FASTA header for each entry has the format

>sp|P49327|FAS_HUMAN    751     HVPEHAVVL

where the first column is the protein identifier, the second column is the location of the HLA ligand in the protein sequence, and the third column is the HLA ligand.

a) Non-human primates. NOTE. This data set has been updated Aug 12. 2009, so that it now corresponds to the data presented in the NetMHCpan-2.0 publication
b) HLA-A and HLA-B ligands
c) HLA-E ligands
d) HLA-C SYFPEITHI ligands
e) HLA-G SYFPEITHI ligands

References

Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen
NetMHCpan - a method for MHC class I binding prediction beyond humans