|
MatrixPlot 1.2: Inform manual
Inform
- NAME
- Inform - computes the regular and mutual information content of a
sequence alignment.
- SYNOPSIS
- Inform [options] [alignment_file]
- DESCRIPTION
- Inform computes the information content on each position in a sequence
alignment as well as the mutual information shared between any two
positions in the alignment. Allowed data formats are: a simple "align format, the fasta format, and the msf format. The program is
written in gawk and options are written by their entire variable
names, e.g., Inform matrix=test.mtr alignfile. Data can also be
piped to Inform.
The output goes to stdout, and is given in the "information"
version of the mp
format. The sequences in the alignment file can be of any
alphabet, which the user must give to the program.
- OPTIONS
-
- alfile=<alfile>
- Alphabet file. List the alphabet of which the sequences are
composed. Should be listed in one line. Each symbol should
be separated by spaces. Those
letters that can substitute for one another (e.g., T and U in
nucleotide sequences), can be grouped together. The typical alphabet for nucleotide
sequences is "A C G UT -", where U and T substitute arbitrarily
for one another. Note that gap (-) must be included
as well. The alphabet file must be given.
- backdist=<backdistfile>
- File containing the background probability distribution of the
considered alphabet. This is used in calculation of the
position-wise information content. The first line should contain
a listing of the alphabet in the same order as in the alphabet
file. The following line(s) should have the same number of fields
as the first line and indicate the background probabilities
for the each position in the alignment (as many lines as
positions in the alignment). If only one line with probabilities
is listed then all positions in the alignment will be assumed to
have that background distribution. The nucleotide example with
uniform background distribution looks like:
A C G UT -
0.25 0.25 0.25 0.25 1
Note that gap background probability is set to one. The
setting of the gap "background probability" is discussed in the
introduction.
- matrix=<matrixfile>
- File containing a matrix that defines which symbols in the
alphabet that are complementary. This only works when mtype=2.
For a discussion of the "complementarity matrix", see in the
introduction. With the
nucleotide alphabet listed above the complementarity matrix has
the form:
A C G UT -
A 0 0 0 1 0
C 0 0 1 0 0
G 0 1 0 1 0
UT 1 0 1 0 0
- 0 0 0 0 0
The gaps must be included. The alphabet should be listed in the
same order as in the alphabet file.
- mtype=1|2
- Compute the mutual information by the standard form (1) or
by the form introduced in the
structure logo (2). For details see the discussion in the
introduction. Default is
2 for nucleotide alphabet and 1 otherwise.
- diagout=y|n
- Include diagonal "zeros" in the output. Default y.
- bp=y|n
- Include complementary matrix elements in the output,
rather than just the upper traingle of the matrix. Default n.
- EXAMPLES
- A basic example of how to execute the program:
Inform alfile=ntalfile backdist=ntdist matrix=ntmat
mtype=2 alignfile > data.mp
Generates the mp file data.mp, using the alphabet listed in
ntalfile, the background probabilities listed in ntdist, the
complementarity matrix read from ntmat. The mutual information
is computed as type 2. The alignment is read from alignfile.
Data can also be piped into Inform:
cat alignfile | Inform alfile=ntalfile backdist=ntdist mtype=1
> data.mp
- AUTHORS
- Original version by Jan Gorodkin,
gorodkin@cbs.dtu.dk.
Program optimized by Hans Henrik Stærfeldt,
hhs@cbs.dtu.dk.
Man pages by Jan Gorodkin, April 1999.
- REFERENCE
-
MatrixPlot: visualizing sequence constraints.
J. Gorodkin, H. H. Stærfeldt, O. Lund, and S. Brunak.
Bioinformatics. 15:769-770, 1999. (http://www.cbs.dtu.dk/services/MatrixPlot/)
GETTING HELP
Scientific problems:
Technical problems:
|