Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Supplementary material

Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan

Here, you will find the data set used for training (IEDB data) and evaluation (SYF data) of the NetMHCIIpan method.

IEDB data

The data in the IEDB data set are quantitative binding data. The IEDB data set contains data for 14 HLA-DR alleles.

The format for each of the files is

DRB1_0401 AILIWMYYHGQRHSDEH 0.14875
DRB1_0401 CDGERPTLAFLQDVM 0.09917
DRB1_0401 DALESIMTTKSVSFR 0.56557
DRB1_0401 DTQFVRFDSDAASQR 0.63844
DRB1_0401 ELGEWVFSAIKSPQA 0.11827
DRB1_0401 ENEYATGAVRPFQAA 0.47888
DRB1_0401 FLIMRNLTNLLSARK 1.00000
DRB1_0401 FNQMIFVSSIFISFY 0.14669
DRB1_0401 GAGLAGAAIGSVGLGKVLID 0.46435
DRB1_0401 GSRGYRLQRKIEAIF 0.37468

where the first column gives the HLA-DR allele, the second column the peptide sequence, and the last column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).

When classifying the peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

DRB1*0101 dataset
DRB1*0301 dataset
DRB1*0401 dataset
DRB1*0404 dataset
DRB1*0405 dataset
DRB1*0701 dataset
DRB1*0802 dataset
DRB1*0901 dataset
DRB1*1101 dataset
DRB1*1302 dataset
DRB1*1501 dataset
DRB3*0101 dataset
DRB4*0101 dataset
DRB5*0101 dataset

SYF data

The data in the SYF data set contains source proteins of MHC ligands downloaded from the SYFPEITHI database. The data set contains data for 584 HLA-DR ligands covering 28 different HLA-DR alleles. Data for each allele is merged into one file in FASTA format. The format for each FASTA entry is

>A1A1_CANFA IPADLRIISANGCK 197 HLA-DRB1_0101 A1A1_CANFA_IPADLRIISANGCK
MGKGVGRDKYEPAAVSEHGDKKKAKKERDMDELKKEVSMDDHKLSLDELHRKYGTDLSRG
LTTARAAEILARDGPNALTPPPTTPEWVKFCRQLFGGFSMLLWIGAILCFLAYGIQAATE
EEPQNDNLYLGVVLSAVVIITGCFSYYQEAKSSKIMESFKNMVPQQALVIRNGEKMSINA
EEVVIGDLVEVKGGDRIPADLRIISANGCKVDNSSLTGESEPQTRSPDFTNENPLETRNI
.
.
.
ALAAFLSYCPGMGVALRMYPLKPTWWFCAFPYSLLIFVYDEVRKLIIRRRPGGWVEKETY
Y

Each FASTA entry is characterized by the Swissprot ID of the source protein (A1A1_CANFA), the MHC ligand sequence (IPADLRIISANGCK), the position of the MHC ligand in the source protein (197), the HLA-DR allele (HLA-DRB1_0101), and finally an identifier combining the Swissprot ID with the MHC ligand sequence.

DRB1*0101 dataset
DRB1*0102 dataset
DRB1*0301 dataset
DRB1*0401 dataset
DRB1*0402 dataset
DRB1*0403 dataset
DRB1*0404 dataset
DRB1*0405 dataset
DRB1*0701 dataset
DRB1*0801 dataset
DRB1*0802 dataset
DRB1*0803 dataset
DRB1*0901 dataset
DRB1*1001 dataset
DRB1*1101 dataset
DRB1*1104 dataset
DRB1*1201 dataset
DRB1*1301 dataset
DRB1*1302 dataset
DRB1*1401 dataset
DRB1*1501 dataset
DRB1*1502 dataset
DRB3*0101 dataset
DRB3*0202 dataset
DRB3*0301 dataset
DRB4*0101 dataset
DRB4*0103 dataset
DRB5*0101 dataset

References

Morten Nielsen, Claus Lundegaard, Thomas Blicher, Bjoern Peters, Alessandro Sette, Sune Justesen, Soren Buus, and Ole Lund. Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan