Output format


An example of output is found below. The output is composed of the following sections:
  1. Training data
  2. Information about the data used to train the ANNs, including the number of datapoints. It is also reported whether repeated flanks are found in the data, and if sequences were removed from the dataset (because shorted than the specified motif).
    You can inspect the distribution of the training data before and after rescaling. If the linear rescale produces a distribution that is too skewed towards zero, you might consider running the analysis again using a logarithmic transformation.

  3. Neural network architecture
  4. Lists the parameters of the final ANN ensemble.
    In this section you will find a "Save the trained MODEL" link. You may then load this model to NNAlign submission page to make predictions on new evaluation data.

  5. Sequence motif
  6. The sequence logo of the learned motif. The height of each column, and the relative height of AA letters, represent the information content in bits at each position of the alignment. Refer to the WebLogo article for details.
    The amino acid preferences at each position in the alignment may also be viewed in a Log-odds matrix format, with positive values indicating favored residues and negative values disallowed amino acids.

  7. Performance measures
  8. Predictive performance is estimated in cross-validation on the training set, and given as Root mean square error (RMSE), Pearson and Spearman correlations.
    For a visual depiction of the correlation between observed vs. predicted values, inspect the "scatterplot" figure.
    The "complete alignment core" file reports the prediction for each sequence and the core of the alignment. The lines are sorted by Prediction score, thus alignment cores that contribute in largest extent to the motif are found at the top of this list.
    The trained model, i.e. the set of network weights optimized on the training data, can be dowloaded to local disk using the relative link. The model file can than be uploaded to server at any moment to obtain prediction on new data.

  9. Evaluation data
  10. If you provided some evaluation data upon submission, you will find the predictions here.
    Prediction are sorted by Score, so that peptides matching most closely the learned motif will appear at the top of the list. If the submission was in FASTA format, the source protein sequence ID is also shown here, and in the case of peptides shared by multiple entries, the sequence IDs are listed separated by / (slash).


Run ID: 22900
Run Name: DRB1_0301.example

Training data

Trained ANNs on 1715 sequences
View data distribution
(See Instructions for optimal data distribution)
Pre-processing: Linear rescale

Neural network architecture

Motif length: 9
Flanking region size: 3
Number of hidden neurons: 10,20
Encode peptide length: Yes
Encode flank region length: Yes
Neural network encoding: Combined Blosum and Sparse
Number of training cycles: 500
Number of NN seeds: 5
Number of networks in final ensemble: 20
Stop training on best test-set performance: No
Cross-validation method: Fast
Subsets for cross-validation: Hobohm clustering (thr=0.8)


Motif length = 9

Sequence motif

Cores realigned with offset correction

Click here if you have problems visualizing this image

Figure: Visualization of the sequence motif using the WebLogo program

View a Log-odds matrix representation of the motif

Performance measures

Folds for cross-validation = 5
RMSE = 0.155001
Pearson correlation coefficient = 0.7069
Spearman rank coefficient = 0.7144

View scatterplot of predicted vs. observed values
Download complete alignment core on the training data

Save the trained MODEL. You may use this model for a new submission

Evaluation data

Uploaded 1068 peptides from 6 FASTA entries

Predictions on evaluation set

DOWNLOAD archive with all results files

Explain the output

Go back


Scientific problems:        Technical problems: