|
Article abstracts
Main references:
Other publications
Henrik Nielsen's PhD thesis
Original method (SignalP v. 1.1)
Identification of prokaryotic and eukaryotic signal peptides
and prediction of their cleavage sites.
Henrik Nielsen, Jacob Engelbrecht, Søren Brunak and Gunnar von
Heijne.
Protein Engineering, 10:1-6 (1997).
We have developed a new method for the identification of signal peptides and
their cleavage sites based on neural networks trained on separate sets of
prokaryotic and eukaryotic sequence. The method performs significantly better
than previous prediction schemes and can easily be applied on genome-wide data
sets. Discrimination between cleaved signal peptides and uncleaved N-terminal
signal-anchor sequences is also possible, though with lower precision.
Predictions can be made on a publicly available WWW server.
PMID: 9051728
(free full text pdf
version)
Update to SignalP v. 2.0
Prediction of signal peptides and signal anchors by a hidden Markov
model.
Henrik Nielsen and Anders Krogh.
Proc Int Conf Intell Syst Mol Biol. (ISMB 6), 6:122-130 (1998).
A hidden Markov model of signal peptides has been developed. It contains
submodels for the N-terminal part, the hydrophobic region, and the region
around the cleavage site. For known signal peptides, the model can be used to
assign objective boundaries between these three regions. Applied to our data,
the length distributions for the three regions are significantly different from
expectations. For instance, the assigned hydrophobic region is between 8 and 12
residues long in almost all eukaryotic signal peptides. This analysis also
makes obvious the difference between eukaryotes, Gram-positive bacteria, and
Gram-negative bacteria. The model can be used to predict the location of the
cleavage site, which it finds correctly in nearly 70% of signal peptides in a
cross-validated test — almost the same accuracy as the best previous method. One
of the problems for existing prediction methods is the poor discrimination
between signal peptides and uncleaved signal anchors, but this is substantially
improved by the hidden Markov model when expanding it with a very simple signal
anchor model.
PMID: 9783217
Update to SignalP v. 3.0
Improved prediction of signal peptides: SignalP 3.0.
Jannick Dyrløv Bendtsen, Henrik Nielsen,
Gunnar von Heijne and Søren Brunak.
J. Mol. Biol., 340:783-795 (2004).
We describe improvements of the currently most
popular method for prediction of classically secreted proteins,
SignalP. SignalP consists of two different predictors based on
neural network and hidden Markov model algorithms, and both
components have been updated. Motivated by the idea that the
cleavage site position and the amino acid composition of the
signal peptide are correlated, new features have been included as
input to the neural network. This addition, together with a
thorough error-correction of a new data set, have improved the
performance of the predictor significantly over SignalP version 2.
In version 3, correctness of the cleavage site predictions have
increased notably for all three organism groups, eukaryotes, Gram
negative and Gram positive bacteria. The accuracy of cleavage site
prediction has increased in the range from 6–17 % over the
previous version, whereas the signal peptide discrimination
improvement mainly is due to the elimination of false positive
predictions, as well as the introduction of a new discrimination
score for the neural network. The new method has also been
benchmarked against other available methods.
PMID: 15223320
doi: 10.1016/j.jmb.2004.05.028
Update to SignalP v. 4.0
SignalP 4.0: discriminating signal peptides from transmembrane regions.
Thomas Nordahl Petersen, Søren Brunak,
Gunnar von Heijne and Henrik Nielsen.
Nature Methods, 8:785-786 (2011).
This is a Correspondence, it has no abstract.
doi: 10.1038/nmeth.1701
Access to the paper: if you have a personal or institutional subscription to
Nature Methods, use the doi: link above. If not, use the ePrint link below:
View ePrint in a popup window
Access to the supplementary materials:
nmeth.1701-S1.pdf
Other publications
Locating proteins in the cell using TargetP,
SignalP, and related tools
Olof Emanuelsson, Søren Brunak, Gunnar von Heijne, Henrik Nielsen
Nature Protocols, 2:953-971 (2007).
Determining the subcellular localization of a protein is an important
first step toward understanding its function. Here, we describe the
properties of three well-known N-terminal sequence motifs directing
proteins to the secretory pathway, mitochondria and chloroplasts, and
sketch a brief history of methods to predict subcellular localization
based on these sorting signals and other sequence properties. We then
outline how to use a number of internet-accessible tools to arrive at a
reliable subcellular localization prediction for eukaryotic and
prokaryotic proteins. In particular, we provide detailed step-by-step
instructions for the coupled use of the amino-acid sequence-based
predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at
the Center for Biological Sequence Analysis, Technical University of
Denmark. In addition, we describe and provide web references to other
useful subcellular localization predictors. Finally, we discuss
predictive performance measures in general and the performance of
TargetP and SignalP in particular.
PMID: 17446895
Please click
here to access the
paper and supplementary materials.
Machine learning approaches to the prediction of signal peptides
and other protein sorting signals.
Henrik Nielsen, Søren Brunak, and Gunnar von Heijne.
Protein Engineering, 12:3-9 (1999), Review.
Prediction of protein sorting signals from the sequence of amino acids has
great importance in the field of proteomics today. Recently, the growth of
protein databases, combined with machine learning approaches, such as neural
networks and hidden Markov models, have made it possible to achieve a level of
reliability where practical use in, for example automatic database annotation
is feasible. In this review, we concentrate on the present status and future
perspectives of SignalP, our neural network-based method for prediction of the
most well-known sorting signal: the secretory signal peptide. We discuss the
problems associated with the use of SignalP on genomic sequences, showing that
signal peptide prediction will improve further if integrated with predictions
of start codons and transmembrane helices. As a step towards this goal, a
hidden Markov model version of SignalP has been developed, making it possible
to discriminate between cleaved signal peptides and uncleaved signal anchors.
Furthermore, we show how SignalP can be used to characterize putative signal
peptides from an archaeon, Methanococcus jannaschii. Finally, we briefly review
a few methods for predicting other protein sorting signals and discuss the
future of protein sorting prediction in general.
PMID: 10065704
A neural network method for identification of prokaryotic and eukaryotic
signal peptides and prediction of their cleavage sites.
Henrik Nielsen, Jacob Engelbrecht, Søren Brunak
and Gunnar von Heijne.
Int. J. Neural Sys., 8:581-599 (1997).
We have developed a new method for the identification of signal peptides and
their cleavage sites based on neural networks trained on separate sets of
prokaryotic and eukaryotic sequences. The method performs significantly better
than previous prediction schemes, and can easily be applied to genome-wide data
sets. Discrimination between cleaved signal peptides and uncleaved N-terminal
signal-anchor sequences is also possible, though with lower precision.
Predictions can be made on a publicly available WWW server:
http://www.cbs.dtu.dk/services/SignalP/.
PMID: 10065837
Defining a similarity threshold for a functional protein sequence pattern:
the signal peptide cleavage site.
Henrik Nielsen, Jacob Engelbrecht, Gunnar von Heijne
and Søren Brunak.
Proteins, 24:165-77 (1996).
When preparing data sets of amino acid or nucleotide sequences it is
necessary to exclude redundant or homologous sequences in order to avoid
overestimating the predictive performance of an algorithm. For some time
methods for doing this have been available in the area of protein structure
prediction. We have developed a similar procedure based on pair-wise
alignments for sequences with functional sites. We show how a correlation
coefficient between sequence similarity and functional homology can be used
to compare the efficiency of different similarity measures and choose a
nonarbitrary threshold value for excluding redundant sequences. The impact
of the choice of scoring matrix used in the alignments is examined. We
demonstrate that the parameter determining the quality of the correlation is
the relative entropy of the matrix, rather than the assumed (PAM or
identity) substitution mode. Results are presented for the case of
prediction of cleavage sites in signal peptides. By inspection of the false
positives, several errors in the database were found. The procedure
presented may be used as a general outline for finding a problem-specific
similarity measure and threshold value for analysis of other functional
amino acid or nucleotide sequence patterns.
PMID: 8820484
From sequence to sorting: Prediction of signal peptides.
Henrik Nielsen.
Ph.D. thesis, defended at Department of Biochemistry,
Stockholm University, Sweden, May 25, 1999.
In the present age of genome sequencing, a vast number of predicted
genes are initially known only by their putative nucleotide
sequence. The newly established field of bioinformatics is concerned
with the computational prediction of structural and functional
properties of genes and the proteins they encode, based on their
nucleotide and amino acid sequences.
Since one of the crucial properties of a protein is its subcellular
location, prediction of protein sorting is an important question in
bioinformatics. A fundamental distinction in protein sorting is that
between secretory and non-secretory proteins, determined by a
cleavable N-terminal sorting signal, the secretory signal peptide.
The main part of this thesis, including four of the six papers,
concerns prediction of secretory signal peptides in both eukaryotic
and bacterial data using two machine learning techniques: artificial
neural networks and hidden Markov models. A central result is the
SignalP prediction method, which has been made available as a World
Wide Web server and is very widely used.
Two additional prediction methods are also included, with one paper
each. ChloroP predicts chloroplast transit peptides, another
cleavable N-terminal sorting signal; while NetStart predicts start
codons in eukaryotic genes. For prediction of all N-terminal signals,
the assignment of correct start codon can be critical, which is why
prediction of translation initiation from the nucleotide sequence is
also important for protein sorting prediction.
This thesis comprises a detailed review of the molecular biology of
protein secretion, a short introduction to the most important machine
learning algorithms in bioinformatics, and a critical review of
existing methods for protein sorting prediction. In addition, it
contains general treatment of the principles of data set construction
and performance evaluation for prediction methods in bioinformatics.
Access to the thesis (without the six included papers):
PhDthesis.pdf; PhDthesis-cover.pdf
CORRESPONDENCE
Henrik Nielsen,
|