Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Output format (version 2.0)



DESCRIPTION

An example of output is found below. The results page is composed of the following sections:
  1. Training data & Neural network architecture
  2. Information about the data used to train the ANNs, including the number of datapoints and the parameters used to train the ANN ensemble. It is also reported whether repeated flanks are found in the data, and if sequences were removed from the dataset (because shorted than the specified motif).
    You can inspect the distribution of the training data before and after rescaling. If the linear rescale produces a distribution that is too skewed towards zero, you might consider running the analysis again using a logarithmic transformation.

  3. Performance measures
    • Predictive performance is estimated in cross-validation on the training set, and given as Root mean square error (RMSE), Pearson and Spearman correlations.
    • For a visual depiction of the correlation between observed vs. predicted values, inspect the "scatterplot" figure.
    • The "complete alignment core" file reports the prediction for each sequence and the core of the alignment. This file consists of several columns:
      • Core: the predicted binding core for the sequence
      • P1: position of the first residue of the core within the sequence
      • Measure: the target value
      • Prediction: the score predicted by the ensemble
      • Peptide: complete sequence of the training example
      • Gap_pos: starting position of the deletion, if any
      • Gap_lgt: length of the deletion, if any
      • Insert_pos: starting position of the insertion, if any
      • Insert_lgt: length of the insertion, if any
      • Core+Gap: the binding core including inserted or deleted amino acids, if any
      • P1_rel: reliability of the starting position of the core. It gives a confidence measure on the location of the core (reliability scores are described in this paper.).
    • The trained "model", i.e. the set of network weights optimized on the training data, can be dowloaded to local disk using the relative link. The model file can than be uploaded to server at any moment to obtain prediction on new data.


  4. Sequence motif
  5. A sequence logo representation of the motif. The height of each column, and the relative height of AA letters, represent the information content in bits at each position of the alignment. Logos are generated using the Seq2Logo program.
    The amino acid preferences at each position in the alignment may also be viewed in a Log-odds matrix (or frequency matrix) format, with positive values indicating favored residues and negative values disallowed amino acids.

  6. Evaluation data
  7. If you provided evaluation data upon submission, you will find the predictions here.
    For evaluation files in peptide format with associated values (i.e. a similar format as for the training data), performance measures will also be available. If the submission was in FASTA format, the source protein sequence ID is also shown here, and in the case of peptides shared by multiple entries, the sequence IDs are listed separated by / (slash).



EXAMPLE OUTPUT


Version: 2.0
Run ID: 12857
Run Name: DRB1_0301.example

Training data

Read 1715 unique sequences
View data distribution
(See Instructions for optimal data distribution)
Pre-processing: Linear rescale

Neural network architecture

Motif length: 9
Flanking region (PFR) size: 3
Number of hidden neurons: 5,15
Peptide length encoding: 13
Flank length encoding: 0
Maximum length of deletions in alignment: 0
Maximum length of insertions in alignment: 0
Amino acid numerical encoding: Blosum
Number of training cycles: 500
Number of NN seeds: 4
Number of networks in final ensemble: 40
Stop training on best test-set performance: Yes
Cross-validation setup: Simple
Folds for cross-validation : 5
Method to create subsets: Random


RESULTS

Performance measures - motif length 9

RMSE = 0.149188
Pearson correlation coefficient = 0.735081
Spearman rank coefficient = 0.731830

View scatterplot of Predicted vs. Observed values
Download complete alignment core on the training data

Save the trained MODEL. You may use this model for a new submission

Sequence motif

Cores realigned with offset correction

Click here if you have problems visualizing this image

Figure: Visualization of the sequence motif using the Seq2Logo program

View a Log-odds matrix or Frequency matrix representation of the motif


Evaluation data

Uploaded 1068 peptides from 6 FASTA entries

See the predictions on the evaluation set




DOWNLOAD
a compressed archive with all results files

Go back


GETTING HELP

Scientific problems:        Technical problems: