Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Supplementary material

Pan-specific MHC class I predictors: A benchmark of HLA-I pan-specific prediction methods.

Here, you will find the data set used for evaluation in the above paper. The data falls in three parts a) Evaluation set 1, b) Evaluation set2 and c) SYFPEITHI HLA ligand data set.

The format for each of the files (eval_set1.xls, eval_set2.xls) is

Allele  Peptide log50k
A0101   AADSFATSY       0.599375
A0101   ADAGFMKQY       -0.051933
A0101   AFEKMVSLL       -0.041218
A0101   AFIDTIKSL       -0.041218
A0101   AFLIGANYL       -0.034976
A0101   AFLLFLVLI       0.359732

where the first column gives the allele, and the second column gives the peptide and the last column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).

When classifying the peptides into binders and non-binders, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

The last file (SYF_set.fasta) contain source proteins for 566 HLA-A and HLA-B ligands downloaded from the SYFPEITHI database in FASTA format. The FASTA header for each entry has the format

prot|A8KA43 227 KRFGKAYNL B2705

where the first column is the protein identifier, the second column is the location of the HLA ligand in the protein sequence, the third column is the HLA ligand, and the last column is the HLA restriction.

Evaluation dataset 1
Evaluation dataset 2
SYFPEITHI ligand evaluation set

References

Hao Zhang, Claus Lundegaard and Morten Nielsen
Pan-specific MHC class I predictors: A benchmark of HLA-I pan-specific prediction methods.