Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

MaxAlign 1.1.ws0

Maximizing alignment area

NOTE: a newer version of this service (MaxAlign 1.1.ws1) is available.

WSDL MaxAlign/MaxAlign_1_1_ws0.wsdl
Schema definitions ../common/ws_common_1_0b.xsd
ws_maxalign_1_1_ws0.xsd

We recommend that the first time users should load the WSDL file above to SoapUI and investigate the Web Service operations in that environment. SoapUI is a desktop application for inspecting, invoking, developing and functional/load/compliance testing of Web Services over HTTP. It can be downloaded free of charge from http://www.soapui.org/.

Other versions and implementations

Ver.Last updated
1.1.ws1  2012-04-19(most recent)
1.1.ws0  2008-09-17(this version)

Examples of client side scripts using the service

FilenameTypeCompatibilityAuthorDescription
maxalign_ws.pl (2.5 KB) Perl 1.1 ws0 Peter Wad Sackett
This is a template example script. It reads an alignment as fasta and produces formatted output from the web service
alignment.fsa (2.6 KB)
xml-compile.pl (3.2 KB) Perl NA Peter Fischer Hallin
Helper scripts used to initiate XML::Compile's proxys (WSDL+XSD)
test_maxalign.pl (5.1 KB) Perl 1.1 ws0 Edita Bartaseviciute
This script runs the MaxAlign 1.1.ws0 Web Service. It requires no input; to be used for testing in the EMBRACE WS Registry.
example.fsa (1.3 KB)
maxalign.pl (4.0 KB) Perl 1.1 ws0 Edita Bartaseviciute
This is a template example script. It reads an alignment as fasta and produces formatted output from the web service

Usage

# download the required scripts
wget http://www.cbs.dtu.dk/ws/MaxAlign/examples/xml-compile.pl
wget http://www.cbs.dtu.dk/ws/MaxAlign/examples/maxalign.pl

perl maxalign.pl < example.fsa 

Documentation

MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide
(or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences
to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other
situations where this form of alignment improvement is useful.

Usage instructions

1. Specify the input sequences
All sequence headers/names MUST be different.
All the input sequences must be in one-letter amino acid or nucleotide code. The suggested alphabet (not case sensitive) is as follows:
A C D E F G H I K L M N P Q R S T U V W Y -

Gaps should be represented only by "-". Other symbols e.g. B,J,X will be considered as nucleotides / amino acids. 

Preserving selected sequences
You might want to keep some sequences in your alignment, even at the cost of excluding some sites.
You can do that by marking those sequences with a plus sign, "+", before their name, as in the example below:

>+Sequence_1
>Sequence_2

Sequence_1 above will always be incorporated in the output of MaxAlign, while Sequence_2 incorporation will be evaluated.
Please be sure your sequence names are not starting with a plus "+" if you don't want them to be marked.

The MaxAlign web-server is freely available at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information
can also be found. The program is also freely available as a Perl stand-alone package. 

WEB SERVICE OPERATION

This Web Service is fully synchronous; There is one operation:

1. maxalign

Input:  The following parameters and data:

        * 'alignment'   [containing multiple 'sequence' element]
          * 'sequence'
            * 'id'         Unique identifier for the sequence
            * 'comment'    Optional comment
            * 'seq'        Protein or nucleotide sequences, with unique identifiers (mandatory) 
                           The sequences must be written using the one letter amino acid/nucleotide
                           code: `acdefghiklmnpqrstvwy' or `ACDEFGHIKLMNPQRSTVWY'. 
                           Gaps are denoted with '-' (dash).

Output: The following parameters and data:

Output: 
        * 'resultalignment'   [containing multiple 'sequence' element]
          * 'sequence'
            * 'id'         Unique identifier for the sequence
            * 'comment'    Optional comment
            * 'seq'        protein sequences, with unique identifiers (mandatory) 

        * 'originalsequencenumber'     Number of sequences in the input alignment
        * 'originalcolumnnumber'       Number of columns in the input alignment
        * 'originalungapcolumnnumber'  Number of columns with no gaps in the input alignment
        * 'originalalignmentarea'      Alignment area of the input alignment
        * 'resultsequencenumber'       Number of sequences in the output alignment; 
                                        appears only if the alignment can be improved
        * 'resultungapcolumnnumber'    Number of columns with no gaps in the output alignment; 
                                        appears only if the alignment can be improved
        * 'resultalignmentarea'        Alignment area of the output alignment; 
                                        appears only if the alignment can be improved