Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

NNAlign-2.0 Server

Discovering sequence motifs in biological sequences

View the version history of this server. All the previous versions are available online, for comparison and reference.

The NNAlign server allows generating artificial neural network models of receptor-ligand interactions. The program takes as input a set of ligand sequences with target values; it returns a sequence alignment, a binding motif of the interaction, and a model that can be used to scan for occurrences of the motif in other sequences.
Visit the links on the pink bar below to read detailed instructions and guidelines, see output formats, or download the code.

New in version 2.0:

  • Custom alphabet, extends applications to DNA/RNA sequences, or peptide data with PTMs.
  • Insertions and deletions in the sequence alignment
  • Encoding of receptor pseudo-sequence, enabling the generation of "pan-specific" methods

Instructions Output format Article abstract Download code

1. TRAIN or UPLOAD a model

Paste peptides in PEPTIDE format

or submit a file directly from your local disk:

To load some SAMPLE DATA click here:

More sample training data:


2. EVALUATION data (optional)

Paste in evaluation examples in PEPTIDE or FASTA format

or upload evaluation examples:


Sample evaluation data in FASTA or PEPTIDE format

3. SUBMIT job




PRESET parameter configurations


MHC CLASS I ligands of variable length
MHC CLASS II ligands:
DNA/RNA data:


CUSTOMIZE your run

Hover the mouse cursor over the symbol for a short description of the options

BASIC options

Job name

Motif length

DATA PROCESSING options

Order of the data
High values are positive instances
Low values are positive instances

Data rescaling
Linear rescale
Log-transform
No rescale

Average target values of identical sequences

Folds for cross-validation

Perform nested cross-validation

Stop training on best test-set performance

Method to create subsets
Random subsets
Homology clustering
Common-motif clustering
User-defined partitions

Alphabet

NEURAL NETWORK architecture

Number of training cycles

Number of seeds

Number of hidden neurons

Amino acid encoding

Maximum length for Deletions

Maximum length for Insertions

Only allow insertions in sequences shorter than the motif length

Burn-in period

Length of the PFR for composition encoding

Encode PFR composition as sparse

Encode PFR length

Expected peptide length for encoding

Binned peptide length encoding

Load receptor pseudo-sequences

Example

SORTING and VISUALIZATION options

Number of networks (per fold) in the final network ensemble

Sort results by prediction value

Exclude offset correction

Show all logos in the final ensemble

EVALUATION DATA options

Length of peptides generated from FASTA entries

Sort evaluation results by prediction value

Threshold on evaluation set predictions


SUBMIT job



NOTE, depending on the size of your datasets and selected parameters it might take up to a few hours to complete the query.
Please be patient.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.

Run locally:
A stand-alone version of the program compiled for Unix and Darwin (MAC) is available: DOWNLOAD


CITATIONS

For publication of results, please cite:

  • NNAlign: a platform to construct and evaluate artificial neural network models of receptor-ligand interactions
    Nielsen M, Andreatta M
    Nucleic Acids Research (2017) Apr 12. doi: 10.1093/nar/gkx276
    Pubmed: 28407117




GETTING HELP

Scientific problems:        Technical problems: