Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Article abstracts


Main references:


Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach.
Buus S1, Lauemoller SL1, Worning P2, Kesmir C2, Frimurer T2, Corbet S3, Fomsgaard A3, Hilden J4, Holm A5, Brunak S2.
Tissue Antigens., 62:378-84, 2003.

1Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark
2Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
3Department of Virology, State Serum Institute, Denmark
4Department of Biostatistics, University of Copenhagen, Denmark
5Research Center for Medical Biotechnology, Chemistry Department, Royal Veterinary and Agricultural University, Denmark

We have generated Artificial Neural Networks (ANN) capable of performing sensitive, quantitative predictions of peptide binding to the MHC class I molecule, HLA-A*0204. We have shown that such quantitative ANN are superior to conventional classification ANN, that have been trained to predict binding vs non-binding peptides. Furthermore, quantitative ANN allowed a straightforward application of a 'Query by Committee' (QBC) principle whereby particularly information-rich peptides could be identified and subsequently tested experimentally. Iterative training based on QBC-selected peptides considerably increased the sensitivity without compromising the efficiency of the prediction. This suggests a general, rational and unbiased approach to the development of high quality predictions of epitopes restricted to this and other HLA molecules. Due to their quantitative nature, such predictions will cover a wide range of MHC-binding affinities of immunological interest, and they can be readily integrated with predictions of other events involved in generating immunogenic epitopes. These predictions have the capacity to perform rapid proteome-wide searches for epitopes. Finally, it is an example of an iterative feedback loop whereby advanced, computational bioinformatics optimize experimental strategy, and vice versa.

PMID: 14617044   (full text version available)



Update to NetMHC v. 2.0

Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.
Nielsen M1, Lundegaard C1, Worning P1, Lauemoller SL2, Lamberth K2, Buus S2, Brunak S1, Lund O1.
Protein Sci., 12:1007-17, 2003.

1Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
2Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark

In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.

PMID: 12717023   (full text version available)


Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.
Nielsen M1, Lundegaard C1, Worning P1, Hvid CS2, Lamberth K2, Buus S2, Brunak S1, Lund O1.
Bioinformatics, 20(9):1388-97, 2004.

1Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
2Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark

MOTIVATION: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. RESULTS: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.

PMID: 14962912   (full text version available)


CORRESPONDENCE

Ole Lund,