Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Usage instructions



1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

All the alphabetic symbols not in the allowed alphabet will be converted to X before processing. All the non-alphabetic symbols, including white space and digits, will be ignored.

The sequences can be input in the following two ways:

  • Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.

  • Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed. However, there may be not more than 2,000 sequences and 200,000 amino acids in toto in one submission. The sequences may not be longer than 6,000 amino acids.


2. Customize your run

  • Organism group:
    It is important for performance that you choose the correct organism group — Eukaryotes, Gram-negative bacteria or Gram-positive bacteria — since the signal peptides of these three groups are known to differ from each other.
    Gram-positive bacteria correspond to Actinobacteria and Firmicutes in the NCBI Taxonomy.
    Gram-negative bacteria are all other eubacteria, except Tenericutes (including Mycoplasma), which seem to lack a type I signal peptidase and therefore do not have standard signal peptides.
    Unfortunately, we are unable to provide a SignalP version for Archaea, since there are too few experimentally confirmed signal peptides from this organism group in the UniProt database (click here to repeat the search).

  • D-cutoff values:
    The default cutoff values for SignalP 4 are chosen to optimize the performance measured as Matthews Correlation Coefficient (MCC). This results in a lower sensitivity (true positive rate) than SignalP 3.0 had. In SignalP 4.1, we have introduced the option of setting the cutoff to a lower value which yields the same sensitivity as SignalP 3.0. This will make the false positive rate slightly higher, but still better than that of SignalP 3.0. Read more on the Performance page.
    You can see which cutoff values are being used in the boxes marked "D-cutoff". They will change if you change the setting for "D-cutoff values" or "Organism group".
    If you want to experiment with your own cutoff values, select "User defined" and the boxes will go blank, ready for you to fill in values between 0 and 1.

  • Graphics output:
    In the default output, SignalP embeds one plot in PNG format per sequence, showing the C-, S-, and Y-scores for each position in the sequence. You can choose to avoid the plots (No graphics) or to add an Encapsulated PostScript (EPS) file for each sequence. The EPS files will be provided as links.
    See Output for an example and explanation of the scores.

  • Output format:
    You can choose between four output formats:
    Standard
    Appropriate for most users. Shows one plot and one summary per sequence.
    Short
    Convenient if you submit lots of sequences. Shows only one line of output per sequence. Incompatible with graphics.
    Long
    Shows the C-, S-, and Y-scores for each position in the sequence in addition to the Standard output.
    All
    Shows the output scores of both neural network types (SignalP-TM and SignalP-noTM) for each position in the sequence. Incompatible with graphics.
    See Output for an example and explanation of the scores.

  • Method:
    Signalp 4.1 contains two types of neural networks. SignalP-TM has been trained with sequences containing transmembrane segments in the data set, while SignalP-noTM has been trained without those sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine whether to use SignalP-TM or SignalP-noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, SignalP-TM is used, otherwise SignalP-noTM).
    An exception is Gram-positive bacteria, where SignalP-TM is used always.
    If you are confident that there are no transmembrane segments in your data, you can get a slightly better performance by choosing "Input sequences do not include TM regions", which will tell SignalP 4.1 to use SignalP-noTM always.

  • Positional limits:
    Minimal predicted signal peptide length
    SignalP 4.0 could, in rare cases, erroneously predict signal peptides shorter than 10 residues. These errors have in SignalP 4.1 been eliminated by imposing a lower limit on the cleavage site position (signal peptide length). The minimum length is by default 10, but you can adjust it. Signal peptides shorter than 15 residues are very rare. If you want to disable this length restriction completely, enter 0 (zero).
    N-terminal truncation of input sequence
    By default, the predictor truncates each sequence to max. 70 residues before submitting it to the neural networks. If you want to predict extremely long signal peptides, you can try a higher value, or disable truncation completely by entering 0 (zero). Note: The neural networks are trained with sequences with a maximal length of 70, and they include the relative position in the sequence in their input. Therefore, general performance will deteriorate if you change this setting.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.




GETTING HELP

Scientific problems:        Technical problems: