MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes
the number of nucleotide(or amino acid) symbols that are present in gap-free columns - the alignment
area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used
prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of
alignment improvement is useful.
Usage instructions
1. Specify the input sequences
All sequence headers/names MUST be different.
All the input sequences must be in one-letter amino acid or nucleotide code. The suggested alphabet
(not case sensitive) is as follows: A C D E F G H I K L M N P Q R S T U V W Y -
Gaps should be represented only by "-". Other symbols e.g. B,J,X will be considered as nucleotides
/ amino acids.
Preserving selected sequences
You might want to keep some sequences in your alignment, even at the cost of excluding some sites.
You can do that by marking those sequences with a plus sign, "+", before their name, as in the example
below:
>+Sequence_1
>Sequence_2
Sequence_1 above will always be incorporated in the output of MaxAlign, while Sequence_2 incorporation
will be evaluated. Please be sure your sequence names are not starting with a plus "+" if you don't
want them to be marked.
The MaxAlign web-server is freely available at http://www.cbs.dtu.dk/services/MaxAlign where
supplementary information can also be found. The program is also freely available as a Perl stand-alone
package.
WEB SERVICE OPERATION
This Web Service is fully synchronous; There is one operation:
1. maxalign
Input: The following parameters and data:
* 'alignment' [containing multiple 'sequence' element]
* 'sequence'
* 'id' Unique identifier for the sequence
* 'comment' Optional comment
* 'seq' Protein or nucleotide sequences, with unique identifiers
(mandatory). The sequences must be written using the one letter
amino acid/nucleotide code: `acdefghiklmnpqrstvwy' or
`ACDEFGHIKLMNPQRSTVWY'. Gaps are denoted with '-' (dash).
Output: The following parameters and data:
Output:
* 'resultalignment' [containing multiple 'sequence' element]
* 'sequence'
* 'id' Unique identifier for the sequence
* 'comment' Optional comment
* 'seq' protein sequences, with unique identifiers (mandatory)
* 'originalsequencenumber' Number of sequences in the input alignment
* 'originalcolumnnumber' Number of columns in the input alignment
* 'originalungapcolumnnumber' Number of columns with no gaps in the input alignment
* 'originalalignmentarea' Alignment area of the input alignment
* 'resultsequencenumber' Number of sequences in the output alignment;
appears only if the alignment can be improved
* 'resultungapcolumnnumber' Number of columns with no gaps in the output alignment;
appears only if the alignment can be improved
* 'resultalignmentarea' Alignment area of the output alignment;
CONTACT
Technical questions concerning the Web Service should go to Karunakar Bayyapu, karun@cbs.dtu.dk
or Kristoffer Rapacki, rapacki@cbs.dtu.dk.