Here, you will find the data set used for training and testing of the NetMHCpan-4.0 method.
The training binding data are partitioned in 5 files to be used for cross-validation. For instance do the f000_ba and f000_el files contain the binding affinity and eluted ligand training data, and the c000_ba and c000_el files the binding affinity and eluted ligand test data for the first cross-validation partitioning. It is critical that this data partitioning is maintained.
The format for each of the files is
ARWLASTPL 0.589395 BoLA-D18.4 85.0 ASYAAAAAY 0.496594 BoLA-D18.4 232.0 GMMGGLWKY 0.439136 BoLA-D18.4 432.0 KMFHGGLRY 0.898463 BoLA-D18.4 3.0 KMLEASTIY 0.75609 BoLA-D18.4 14.0 KQLEYSWVL 0.481554 BoLA-D18.4 273.0 KQWSWFSLL 0.451477 BoLA-D18.4 378.0 MMFDAMGAL 0.935937 BoLA-D18.4 2.0 MMMSTAVAF 0.762939 BoLA-D18.4 13.0 MTFPVSLEY 0.485003 BoLA-D18.4 263.0
where the first column gives the peptide, the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)) or 1/0 for the eluted ligangd data, and the third column the class I allele.
When classifying BA peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.
f000_ba (Train data) c000_ba (Test data)
f000_el (Train data) c000_el (Test data)
NetMHCpan, a method for MHC class I binding prediction beyond humans