LipoP 1.0 User's guide
This program is for prediction of lipoproteins and for discriminating between
lipoprotein signalpetides, other signal peptides and n-terminal membrane helices
in Gram negative bacteria.
The method is described in:
A. Sierakowska, H. Willenbrock, G. von Heijne, H. Nielsen, S. brunak
and A. Krogh (2003)
Prediction of lipoprotein signal peptides in Gram-negative bacteria.
Protein Sci., 12(8):1652-1662, 2003.
For more information, please contact krogh@binf.ku.dk.
Input
The program takes proteins in FASTA format. It recognizes the
20 amino acids and B, Z, and X, which are all treated equally as unknown.
Any other character is changed to X, so please make sure the sequences are
sensible proteins
This is an example (one protein):
>5H2A_CRIGR you can have comments after the ID
MEILCEDNTSLSSIPNSLMQVDGDSGLYRNDFNSRDANSSDASNWTIDGENRTNLSFEGYLPPTCLSILHL
QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWP
LPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGVSMPIPVF
GLQDDSKVFKQGSCLLADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEATLCVSDLSTRAKLASFSFL
PQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNE
HVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILVNTIPALAYKSSQLQA
GQNKDSKEDAEPTDNDCSMVTLGKQQSEETCTDNINTVNEKVSCV
Only the first 70 amino acids are used for prediction.
How to run it
Either give the name of the local file in which you have the proteins in
the top half of the window, or paste the sequence(s) into the lower part
of the window (it should be possible to both give it a local file and paste
sequences if you really want). Now select one of the three output options
("Extensive, with graphics", "Extensive, no graphics", or "Short output format").
Then press "Submit".
Output
The output format is essentially in GFF format. The default (long) output
format looks like this:
# ANIA_NEIGO SpII score=29.6052 margin=11.2327 cleavage=18-19 Pos+2=G
# Cut-off=-3
ANIA_NEIGO LipoP1.0:Best SpII 1 1 29.6052
ANIA_NEIGO LipoP1.0:Margin SpII 1 1 11.2327
ANIA_NEIGO LipoP1.0:Class SpI 1 1 18.3725
ANIA_NEIGO LipoP1.0:Class CYT 1 1 -0.200913
ANIA_NEIGO LipoP1.0:Signal CleavII 18 19 29.6052 # FALAA|CGGEQ Pos+2=G
ANIA_NEIGO LipoP1.0:Signal CleavI 24 25 18.0333 # GGEQA|AQAPA
ANIA_NEIGO LipoP1.0:Signal CleavI 20 21 15.9259 # LAACG|GEQAA
ANIA_NEIGO LipoP1.0:Signal CleavI 26 27 12.0794 # EQAAQ|APAET
ANIA_NEIGO LipoP1.0:Signal CleavI 25 26 11.4077 # GEQAA|QAPAE
ANIA_NEIGO LipoP1.0:Signal CleavI 27 28 9.40252 # QAAQA|PAETP
(output trunctated)
The first line, which is the only line if short
output is chosen, summarizes the best prediction. In the example the
best prediction is a lipoprotein with a cleavage site between amino acid
18 and 19 and amino acid G (glycine) in position +2 after the cleavage site.
The second line gives the cut-off used. In the following the columns contain
- Sequence ID
- Type of prediction. Best means the highest scoring class, Margin gives
the difference between the best score and the second best score, Class gives
the score of other classes and Signal lines contain predicted cleavage sites.
- Feature type, see below
- Location in the sequence. For lines with a class prediction it is always
1. For cleavage sites it is the last amino acid of the signal peptide relative
to the predicted cleavage site.
- Location as above axcept that for cleavage sites it is the first amino
acids after the cleavage site.
- Score. For the "Margin" type it is the difference between the best
and the second best class score. Otherwise the log-odds score.
- For the cleavage sites the ±5 context is shown after the #,
and for lipoprotein cleavage sites the amino acid in postition +2 is shown
(which may determine whether the lipoprotein is attached to the inner or
outer membrane, see below).
These 4 clases are predicted
SpI: signal peptide (signal peptidase I)
SpII: lipoprotein signal peptide (signal peptidase II)
TMH: n-terminal transmembrane helix. This is generally not a very
reliable prediction and should be tested. This part of the model is mainly
there to avoid tranmembrane helices being falsely predicted as signal peptides.
CYT: cytoplasmic. It really just means all the rest.
For technical reasons (see paper) the score for CYT is always the same.
These signals are predicted:
CleavI: Cleavage sites for (signal peptidase I).
CleavII: Cleavage sites for (signal peptidase II).
Plot of scores
A plot of the cleavage site scores is made in postscript unless you have chosen
the short output format or disabled the plot. For each predicted cleavage
site, the score is shown. Two different colors are used for SpI and SpII.
To the left is shown the scores of the classes scoring higher than the cut-off.
The postscript is converted to an image (png format) and included in the html
output (if selected).
Below the plot there are links to
- The plot in encapsulated postscript
- A script for making the plot in gnuplot.
If there are only few predictions of cleavage sites, no plot is made.
Interpreting the output
It is shown in the paper that the margin, i.e., the difference between the
best and the second best prediction, correlates well with the number of falsely
predicted signal peptides.
An aspartic acid (D) in position +2 after the cleavage site of a lipoprotein
means that it is attached to the inner membrane, and most other lipoproteins
are attached to the outer membrane. Therefore we report the amino acid in
this position for predicted lipoproteins. See e.g. Seydel et al (1999) Molecular Microbiology 34:
810-821 for more details.
The cross-validation test reported in the paper gave the results shown in
the table below. The highest scoring class was predicted.
For signal peptides, 309 out of 328 were correctly classified
as such, whereas 2 where classified as lipoproteins, 14 as cytoplasmic and
3 as having an n-terminal transmembrane helix. Of 63 lipoproteins, 61 were
classified correctly.
|
Correct class
|
Predicted class
|
|
SPaseI
|
SPaseII
|
Cytoplasmic
|
TMH
|
Total
|
|
SPaseI
|
309
|
2
|
14
|
3
|
328
|
|
SPaseII
|
2
|
61
|
0
|
0
|
63
|
|
Cytoplasmic
|
5
|
1
|
382
|
0
|
388
|
|
TMH
|
8
|
0
|
21
|
142
|
171
|
It is also shown in the paper that the prediction is more reliable the higher
the margin is.