An example of output is found below. The output is composed of the following sections:
- Training data
Information about the data used to train the ANNs, including the number of datapoints. It is also reported whether repeated flanks are found in the data, and if sequences were removed from the dataset (because shorted than the specified motif).
You can inspect the distribution of the training data before and after rescaling. If the linear rescale produces a distribution that is too skewed towards zero, you might consider running the analysis again using a logarithmic transformation.
- Neural network architecture
Lists the parameters of the final ANN ensemble.
In this section you will find a "Save the trained MODEL" link. You may then load this
model to NNAlign submission page to make predictions on new evaluation data.
- Sequence motif
The sequence logo of the learned motif. The height of each column, and the relative height of AA letters, represent the information content in bits at each position of the alignment. Refer to the WebLogo article for details.
The amino acid preferences at each position in the alignment may also be viewed in a Log-odds matrix format, with positive values indicating favored residues and negative values disallowed amino acids.
- Performance measures
Predictive performance is estimated in cross-validation on the training set, and given as Root mean square error (RMSE), Pearson and Spearman correlations.
For a visual depiction of the correlation between observed vs. predicted values, inspect
the "scatterplot" figure.
The "complete alignment core" file reports the prediction for each sequence and the
core of the alignment. The lines are sorted by Prediction score, thus alignment cores
that contribute in largest extent to the motif are found at the top of this list.
The trained model, i.e. the set of network weights optimized on the training data, can be dowloaded to local disk using the relative link. The model file can than be uploaded to server at any moment to obtain prediction on new data.
- Evaluation data
If you provided some evaluation data upon submission, you will find the predictions here.
Prediction are sorted by Score, so that peptides matching most closely the learned motif will appear at the top of the list. If the submission was in FASTA format, the source protein sequence ID is also shown here, and in the case of peptides shared by multiple entries, the sequence IDs are listed separated by / (slash).
Run ID: 22900|
Run Name: DRB1_0301.example
Trained ANNs on 1715 sequences
View data distribution
(See Instructions for optimal data distribution)
Pre-processing: Linear rescale
Neural network architecture
Motif length: 9
Flanking region size: 3
Number of hidden neurons: 10,20
Encode peptide length: Yes
Encode flank region length: Yes
Neural network encoding: Combined Blosum and Sparse
Number of training cycles: 500
Number of NN seeds: 5
Number of networks in final ensemble: 20
Stop training on best test-set performance: No
Cross-validation method: Fast
Subsets for cross-validation: Hobohm clustering (thr=0.8)
Motif length = 9
Cores realigned with offset correction
Click here if you have problems visualizing this image
Figure: Visualization of the sequence motif using the WebLogo program
View a Log-odds matrix representation of the motif
Folds for cross-validation = 5
RMSE = 0.155001
Pearson correlation coefficient = 0.7069
Spearman rank coefficient = 0.7144
View scatterplot of predicted vs. observed values
Download complete alignment core on the training data
Save the trained MODEL. You may use this model for a new submission
Uploaded 1068 peptides from 6 FASTA entries
Predictions on evaluation set
DOWNLOAD archive with all results files
Explain the output