|
Article abstracts
Main references:
Sensitive quantitative predictions of peptide-MHC binding
by a 'Query by Committee' artificial neural network approach.
Buus S1, Lauemoller SL1, Worning P2,
Kesmir C2, Frimurer T2, Corbet S3,
Fomsgaard A3, Hilden J4, Holm A5,
Brunak S2.
Tissue Antigens., 62:378-84, 2003.
1Division of Experimental Immunology,
Institute of Medical Microbiology and Immunology,
University of Copenhagen, Denmark
2Center for Biological Sequence Analysis,
Technical University of Denmark,
DK-2800 Lyngby, Denmark
3Department of Virology,
State Serum Institute, Denmark
4Department of Biostatistics,
University of Copenhagen, Denmark
5Research Center for Medical Biotechnology,
Chemistry Department,
Royal Veterinary and Agricultural University, Denmark
We have generated Artificial Neural Networks (ANN) capable of performing
sensitive, quantitative predictions of peptide binding to the MHC class I
molecule, HLA-A*0204. We have shown that such quantitative ANN are superior to
conventional classification ANN, that have been trained to predict binding vs
non-binding peptides. Furthermore, quantitative ANN allowed a straightforward
application of a 'Query by Committee' (QBC) principle whereby particularly
information-rich peptides could be identified and subsequently tested
experimentally. Iterative training based on QBC-selected peptides considerably
increased the sensitivity without compromising the efficiency of the
prediction. This suggests a general, rational and unbiased approach to the
development of high quality predictions of epitopes restricted to this and
other HLA molecules. Due to their quantitative nature, such predictions will
cover a wide range of MHC-binding affinities of immunological interest, and
they can be readily integrated with predictions of other events involved in
generating immunogenic epitopes. These predictions have the capacity to perform
rapid proteome-wide searches for epitopes. Finally, it is an example of an
iterative feedback loop whereby advanced, computational bioinformatics optimize
experimental strategy, and vice versa.
PMID: 14617044
(full text version available)
Update to NetMHC v. 2.0
Reliable prediction of T-cell epitopes using neural networks with novel
sequence representations.
Nielsen M1, Lundegaard C1, Worning P1,
Lauemoller SL2, Lamberth K2, Buus S2,
Brunak S1, Lund O1.
Protein Sci., 12:1007-17, 2003.
1Center for Biological Sequence Analysis,
Technical University of Denmark,
DK-2800 Lyngby, Denmark
2Division of Experimental Immunology,
Institute of Medical Microbiology and Immunology,
University of Copenhagen, Denmark
In this paper we describe an improved neural network method to predict T-cell
class I epitopes. A novel input representation has been developed consisting of
a combination of sparse encoding, Blosum encoding, and input derived from
hidden Markov models. We demonstrate that the combination of several neural
networks derived using different sequence-encoding schemes has a performance
superior to neural networks derived using a single sequence-encoding scheme.
The new method is shown to have a performance that is substantially higher than
that of other methods. By use of mutual information calculations we show that
peptides that bind to the HLA A*0204 complex display signal of higher order
sequence correlations. Neural networks are ideally suited to integrate such
higher order correlations when predicting the binding affinity. It is this
feature combined with the use of several neural networks derived from different
and novel sequence-encoding schemes and the ability of the neural network to be
trained on data consisting of continuous binding affinities that gives the new
method an improved performance. The difference in predictive performance
between the neural network methods and that of the matrix-driven methods is
found to be most significant for peptides that bind strongly to the HLA
molecule, confirming that the signal of higher order sequence correlation is
most strongly present in high-binding peptides. Finally, we use the method to
predict T-cell epitopes for the genome of hepatitis C virus and discuss
possible applications of the prediction method to guide the process of rational
vaccine design.
PMID: 12717023
(full text version available)
Improved prediction of MHC class I and class II epitopes using a novel
Gibbs sampling approach.
Nielsen M1, Lundegaard C1, Worning P1,
Hvid CS2, Lamberth K2, Buus S2,
Brunak S1, Lund O1.
Bioinformatics, 20(9):1388-97, 2004.
1Center for Biological Sequence Analysis,
Technical University of Denmark,
DK-2800 Lyngby, Denmark
2Division of Experimental Immunology,
Institute of Medical Microbiology and Immunology,
University of Copenhagen, Denmark
MOTIVATION: Prediction of which peptides will bind a specific major
histocompatibility complex (MHC) constitutes an important step in identifying
potential T-cell epitopes suitable as vaccine candidates. MHC class II binding
peptides have a broad length distribution complicating such predictions. Thus,
identifying the correct alignment is a crucial part of identifying the core of
an MHC class II binding motif. In this context, we wish to describe a novel
Gibbs motif sampler method ideally suited for recognizing such weak sequence
motifs. The method is based on the Gibbs sampling method, and it incorporates
novel features optimized for the task of recognizing the binding motif of MHC
classes I and II. The method locates the binding motif in a set of sequences
and characterizes the motif in terms of a weight-matrix. Subsequently, the
weight-matrix can be applied to identifying effectively potential MHC binding
peptides and to guiding the process of rational vaccine design. RESULTS: We
apply the motif sampler method to the complex problem of MHC class II binding.
The input to the method is amino acid peptide sequences extracted from the
public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II
complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor)
positions in the binding motif is shown to improve the predictive performance
of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble
average over suboptimal solutions is shown to outperform the use of a single
optimal solution. In a large-scale benchmark calculation, the performance is
quantified using relative operating characteristics curve (ROC) plots and we
make a detailed comparison of the performance with that of both the TEPITOPE
method and a weight-matrix derived using the conventional alignment algorithm
of ClustalW. The calculation demonstrates that the predictive performance of
the Gibbs sampler is higher than that of ClustalW and in most cases also higher
than that of the TEPITOPE method.
PMID: 14962912
(full text version available)
CORRESPONDENCE
Ole Lund,
|