Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

This site contains a downloadable version of and supplementary information for the paper

Locating proteins in the cell using TargetP, SignalP, and related tools
Olof Emanuelsson, Søren Brunak, Gunnar von Heijne, Henrik Nielsen
Nature Protocols 2, 953-971 (2007).

In this webpage, all websites mentioned in the protocol are shown as active links, to make it easier for the reader to try all the online tools.


Accessing the paper

If you have access to an institutional subscription to Nature Protocols, it will be most convenient to download the paper from the journal website:

Otherwise, you can download a socalled ePrint of the paper here:

  • NATURE_169477_CP.exe
    • Note: this is an exetutable file, which only works under Windows. Mac or Linux users please contact Henrik Nielsen.
    • Instructions: Right-click the link and save the file to your computer. To open the saved ePrint file simply double-click on the file icon you saved to your computer. You must be connected to the Internet the first time you open your ePrint. This one-time Internet connection allows your ePrint to validate itself.
    • Note: the ePrint can be printed, but only twice per computer. It is not possible to print to a PDF with Acrobat Distiller or equivalent programs.
    • If you have questions or problems, consult the ePrint FAQ or write to Henrik Nielsen.


NEW: An automated implementation

A group in Tampere, Finland, have developed an automated implementation of our protocol for animal (including human) sequences. The method called PROlocalizer and is available here:

The paper is available through open access here:

PROlocalizer: integrated web service for protein subcellular localization prediction
Kirsti Laurila and Mauno Vihinen
Amino Acids 2010 Sep 2. [Epub ahead of print] (PubMed link)


Sample data sets

Here are two example data sets, one prokaryotic and one eukaryotic, that the reader may use for testing the methods described in the protocol. They are in the FASTA format required by all the servers.

The names of the proteins are their UniProt/Swiss-Prot identifiers. The annotated subcellular location is indicated after each name. The full annotation of a protein can be retrieved from the UniProt Knowledgebase by inserting its identifier into the search field at the top of the page.


Links from PROCEDURE

Here are links to all the websites mentioned in the PROCEDURE section of the protocol, shown together with the step number(s) in which they occur. Programs hosted at the Center for Biological Sequence Analysis are marked "(CBS)".

1-4. TargetP 1.1 (CBS): Secretory signal peptides, mitochondrial targeting peptides and chloroplast transit peptides in eukaryotes.

5-6. ChloroP 1.1 (CBS): Chloroplast transit peptides in plants.

7-9. SignalP 3.0 (CBS): Secretory signal peptides in eukaryotes, Gram-negative and Gram-positive bacteria.

10. TatP (CBS): Twin-arginine translocation signal peptides in bacteria.

11. TMHMM 2.0 (CBS): Transmembrane alpha-helices. (A list of other transmembrane alpha-helix predictors is found at the PSORT main page).

12. B2TMR and HMM-B2TMR: Transmembrane beta-barrels. (A list of other transmembrane beta-barrel predictors is found at the PSORT main page).

13 A. big-Pi: GPI membrane anchors in eukaryotes; NMT and Myristoylator: Myristoyl membrane anchors in eukaryotes.

13 B. LipoP (CBS): Lipoprotein signal peptides in bacteria.

14. PROSITE: Scan your sequence and watch for pattern PS00014 / ER_TARGET, indicating endoplasmic reticulum lumenal retention motifs in eukaryotes. Golgi predictor: Golgi retention signals in eukaryotes.

15. PredictNLS and NucPred: Nuclear localisation signals in eukaryotes. NetNES (CBS): Leucine-rich nuclear export signals in eukaryotes.

16. PeroxiP and PTS1: C-terminal peroxisomal targeting signals in eukaryotes.

17. NCBI BLAST: Database searches - click "Protein-protein BLAST (blastp)" and choose "swissprot" or "nr" as the database.

18. See the methods listed in BOX 1 below, or consult the longer list at the PSORT main page.


Links from TROUBLESHOOTING

Here are links to all the websites mentioned in the TROUBLESHOOTING section of the protocol, shown together with the step number in which they occur. Programs hosted at the Center for Biological Sequence Analysis are marked "(CBS)".

1. SecretomeP (CBS): Signal peptide-less secretion in mammals and bacteria.

2. NetStart (CBS): start codon prediction in eukaryotic DNA sequences.

3. ProP (CBS): Propeptide cleavage in eukaryotes.

4. Phobius: Combined prediction of signal peptides and transmembrane alpha-helices.


Links from BOX 1

Here are links to all the non-CBS multicategory localisation prediction programs mentioned in BOX 1 (see step 18 of the PROCEDURE).

WoLF PSORT: 11 locations in animals and plants, and 10 in fungi.

PSORTb v.2.0: 5 locations in Gram-negative bacteria and 4 in Gram-positive bacteria.

iPSORT: 4 locations in plants and 3 in other eukaryotes (TargetP data).

LOCtree: 6 locations in plants, 5 in other eukaryotes and 3 in bacteria.

BaCelLo: 5 locations in plants and 4 in other eukaryotes.

Protein Prowler v. 1.2: 4 locations in plants and 3 in other eukaryotes (TargetP data).

CELLO v.2.5: 12 locations in eukaryotes, 5 in Gram-negative bacteria and 4 in Gram-positive bacteria.

PA-SUB v2.5: 9 locations in animals, 10 in plants, 9 in fungi, 6 in Gram-negative bacteria and 4 in Gram-positive bacteria.

TargetLoc: 4 locations in plants and 3 in other eukaryotes (TargetP data).

MultiLoc: 9 locations in animals, 10 in plants and 9 in fungi.




CORRESPONDENCE

Henrik Nielsen,