Predicting subcellular localization of proteins based on their N-terminal
amino acid sequence.
Olof Emanuelsson1, Henrik Nielsen2,
Søren Brunak2 and Gunnar von Heijne1.
J. Mol. Biol., 300: 1005-1016, 2000.
1Stockholm Bioinformatics Center, Department of
Biochemistry, Stockholm University, S-106 91 Stockholm, Sweden
2Center for Biological Sequence Analysis, BioCentrum-DTU,
Technical University of Denmark, DK-2800 Lyngby, Denmark
A neural network-based tool, TargetP, for large-scale subcellular location
prediction of newly identified proteins has been developed. Using N-terminal
sequence information only, it discriminates between proteins destined for the
mitochondrion, the chloroplast, the secretory pathway, and "other"
localizations with a success rate of 85% (plant) or 90% (non-plant) on
redundancy-reduced test sets. From a TargetP analysis of the recently sequenced
Arabidopsis thaliana chromosomes 2 and 4 and the Ensembl Homo sapiens protein
set, we estimate that 10% of all plant proteins are mitochondrial and 14%
chloroplastic, and that the abundance of secretory proteins, in both
Arabidopsis and Homo, is around 10%. TargetP also predicts cleavage sites with
levels of correctly predicted sites ranging from approximately 40% to 50%
(chloroplastic and mitochondrial presequences) to above 70% (secretory signal