|
Data sets used in the construction and evaluation of TargetP
Data sets redundancy reduced on presequence + first amino acid of mature protein if not
otherwise stated.
Plant predictor data sets
Chloroplast (cTP) sequences
number: 141       fasta       AC numbers
Comments: SWISS-PROT release 36. Only plant proteins are included -- i.e. no algae.
Mitochondrial (mTP) sequences
number: 368       fasta       AC numbers
Comments: SWISS-PROT release 36. Plant and non-plant proteins (*).
Secretory Pathway/Signal Peptide (SP) sequences
number: 269       fasta       AC numbers
Comments: SWISS-PROT release 36.
Nuclear sequences
number:   48       fasta       AC numbers         (redundancy reduced on 112 N-term.
aa's)
number:   54       fasta       AC numbers         (redundancy reduced on 68 N-term.
aa's)
Comments: SWISS-PROT release 36. Redundancy reduced on different lengths to be used in training of
different 1st layer networks (see article).
Cytosolic sequences
number:   87       fasta       AC numbers         (redundancy reduced on 112 N-term.
aa's)
number: 108       fasta       AC numbers         (redundancy reduced on 68 N-term.
aa's)
Comments: SWISS-PROT release 36. Redundancy reduced on different lengths to be used in training of
different 1st layer networks (see article).
The set of 940 proteins used in Tables 1 and 2 in the JMB article consists of the cTP, mTP, SP,
nuclear(54), and cytosolic(108) sets. The "other" set (162 entries) is a concatenation of the nuclear and
cytosolic sets.
Non-plant predictor data sets
Mitochondrial (mTP) sequences
number:   371       fasta       AC numbers        (redundancy reduced on mTP+3 aa)
Comments: SWISS-PROT release 38. Plant and non-plant proteins (*).
Secretory Pathway/Signal Peptide (SP) sequences
number:   715       fasta       AC numbers
Comments: SWISS-PROT release 37.
Nuclear sequences
number: 1214       fasta       AC numbers         (redundancy reduced on 68 N-term.
aa's)
Comments: SWISS-PROT release 37.
Cytosolic sequences
number:   438       fasta       AC numbers         (redundancy reduced on 68 N-term.
aa's)
Comments: SWISS-PROT release 37.
The set of 2738 proteins used in Tables 1 and 2 in the JMB article consists of the mTP, SP,
nuclear, and cytosolic sets. The "other" set (1652 entries) is a concatenation of the nuclear and
cytosolic sets.
 (*) These data sets contain both plant and non-plant mTPs as justified by a study by Schneider
et al., Proteins, 30, 49-60 (1998).
SWISS-PROT (Switzerland)
GETTING HELP
Scientific problems:
Technical problems:
|