![]() |
|
||||
![]() |
|||||
|
Supplementary materialNetMHCIIpan-2.0: Improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedureHere, you will find the data set used for training (Quantitative peptide binding data) and evaluation (SYFPEITHI ligands and IEDB T cell epitope data) of the NetMHCIIpan-2.0 method. Quantitative peptide binding dataThe quantitative binding data are partitioned in 5 files to be used for cross-validation. For instance does the f000 file contain training data, and c000 file test data for the first cross-validation partitioning. The format for each of the files is
AAAGAEAGKATTEEQ 0.0895297 DRB1_0101 AAAGAEAGKATTEEQ 0.0731308 DRB1_0901 AMLHWSLILPGIKAQ 0.604124 DRB1_0101 EGTKVTFHVEKGSNP 0.0130254 DRB3_0101 IPVFLQEALNIALVA 0 DRB1_0301 CIEYVTLNASQYANC 0.264118 DRB1_0301 CIEYVTLNASQYANC 0.356533 DRB1_0901 LHRVVLLESIAQFGD 0.261746 DRB3_0101 LRKAGKSVVVLNRKT 0 DRB1_0401 KYKFVRIQPGQTFSV 0.606066 DRB1_1501 where the first column gives the peptide, the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)), and the last column the HLA-DR allele. When classifying the peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders. f000 (Train data) c000 (Test data) SYFPEITHI dataThe data in the SYF data set contains source proteins of MHC ligands downloaded from the SYFPEITHI database. The data set contains data for 1164 HLA-DR ligands covering 28 different HLA-DR alleles. The format for each FASTA entry is
>HLA-DRB1_0101 AVDDVQYVDEIASVLTSQ MKHHHHHHHSDYDIPTTENLYFQGSAAATGPSFWLGNETLKVPLALFALNRQRLCERLRK NPAVQAGSIVVLQGGEETQRYCTDTGVLFRQESFFHWAFGVTEPGCYGVIDVDTGKSTLF VPRLPASHATWMGKIHSKEHFKEKYAVDDVQYVDEIASVLTSQKPSVLLTLRGVNTDSGS VCREASFDGISKFEVNNTILHPEIVECRVFKTDMELEVLRYTNKISSEAHREVMKAVKVG MKEYELESLFEHYCYSRGGMRHSSYTCICGSGENSAVLHYGHAGAPNDRTIQNGDMCLFD MGGEYYCFASDITCSFPANGKFTADQKAVYEAVLRSSRAVMGAMKPGVWWPDMHRLADRI HLEELAHMGILSGSVDAMVQAHLGAVFMPHGLGHFLGIDVHDVGGYPEGVERIDEPGLRS LRTARHLQPGMVLTVEPGIYFIDHLLDEALADPARASFFNREVLQRFRGFGGVRIEEDVV VTDSGIELLTCVPRTVEEIEACMAGCDKAFTPFSGPK Each FASTA entry is characterized by the HLA-DR allele (HLA-DRB1_0101), and the HLA-DR ligand. SYFPEITHI dataset IEDB dataThe data in the IEDB data set contains source proteins of MHC class II T cell epitopes downloaded from the IEDB database. The data set contain 1325 HLA-DR epitope covering 42 different HLA-DR alleles. The format for each FASTA entry is
>HLA-DRB1_0101 AETPGCVAYIGISFLDQASQ MKIRLHTLLAVLTAAPLLLAAAGCGSKPPSGSPETGAGAGTVATTPASSPVTLAETGSTL LYPLFNLWGPAFHERYPNVTITAQGTGSGAGIAQAAAGTVNIGASDAYLSEGDMAAHKGL MNIALAISAQQVNYNLPGVSEHLKLNGKVLAAMYQGTIKTWDDPQIAALNPGVNLPGTAV VPLHRSDGSGDTFLFTQYLSKQDPEGWGKSPGFGTTVDFPAVPGALGENGNGGMVTGCAE TPGCVAYIGISFLDQASQRGLGEAQLGNSSGNFLLPDAQSIQAAAAGFASKTPANQAISM IDGPAPDGYPIINYEYAIVNNRQKDAATAQTLQAFLHWAITDGNKASFLDQVHFQPLPPA VVKLSDALIATISS Each FASTA entry is characterized by the HLA-DR allele (HLA-DRB1_0101), and the HLA-DR epitope. IEDB dataset ReferencesMorten Nielsen, Sune Justesen, Ole Lund, Claus Lundegaard, and Soren Buus |