Scientific background

For a brief description of the SignalP method please consult the article abstracts.

Biological background

Interest in signal peptides has for a long time been one of the hot topics in bioinformatics. The importance of signal peptides was emphasized in 1999 when G√ľnter Blobel received the Nobel Prize in physiology or medicine for his discovery "proteins have intrinsic signal that govern their transport and localization in the cell". He pointed out the importance of defined peptide motifs for targeting proteins to their site of function. The press release can be read here
For biological background of protein localization we refer to the following pages.
Signal peptides
Signal anchors
Other secretory signals

Data sets and statictics

A very important task in machine learning methods is to obtain a clean and accurate dataset for training and testing. Bias and noise in the data set often lead to wrong predictions, which is undesirable.
Description of data sets
Dataset extraction
Dataset cleanup
Sequence logos
Length distributions
Characteristics of signal peptides
Download the training sets

Methods for prediction of signal peptides

With the current growth of sequence databases and speed of genome sequencing, accurate prediction methods have become increasingly important. For SignalP we have focused on neural networks as well as Hidden Markov Models.
Neural Networks
Hidden Markov Models

Performance and results

Any machine learning approach must be evaluated to test the predictive performance on unknown sequences.
Performance of the current prediction method
Five fold crossvalidation
Independent test set by Menne
Signal anchor prediction


The information on these pages are partly generated by the initial creator of SignalP, Henrik Nielsen. The information provided have been updated with new knowledge, but most of the biological background text emerges from Henriks work.


Henrik Nielsen,