Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

hoWWWlite instructions


DESCRIPTION

howlite program is an artificial neural network simulator especially designed to handle symbol sequences. The input to the network takes the form of single letter symbol strings - typically representing amino acids or nucleotides in biological sequences - while symbols in the output represent categories assigned to each single input symbol. The output categories will most often symbolize structural or functional aspects of the monomers in the linear input sequence.

The network architecture is limited to networks of the feed-forward type, with maximally 1 hidden layer separating the input layer and the output layer. The training principle is standard backpropagation.

The WWW based version of the program reads a number of run-time parameters from an input form. These parameters control the architecture of the neural network, many aspects of the training process, the number of sequences used for training and testing, file names for additional input and output produced by the program, and the complexity and detail of the output delivered by the program.

When started, the howlite program will first produce a listing of all relevant parameters used, followed by statistics showing details of the classification performance during the training process. The output of a training run may subsequently be visualized with plots of various preformance measures. A trained network may also be used - in a static mode without further training - to produce predicted classification results for novel sequences not previously shown to the network.

The essential data for the program is:

(a) the set of input parameter values,
(b) the sequence files holding training and test data,
(c) the weight file holding the adjustable neural network parameters.

The sources from which these data are taken can be controlled by the user via a WWW browser.


ESSENTIAL RUN-TIME PARAMETERS

All parameters are read by the program from an input form. Reasonable default settings will be used if the user takes no action.

NIALPH
NUMBER OF LETTERS IN THE INPUT ALPHABET
Typically 21 or 20 for standard amino acid networks, 5 or 4 for nucleotide networks. Must be integer.


NOALPH
NUMBER OF LETTERS IN THE OUTPUT ALPHABET
Equals the number of output categories. The minimum is 2. Must be integer.


NWSIZE
WINDOW SIZE IN LETTERS
The size of the symmetrical window surrounding the central symbol. Must be uneven non-negative integer.


N2HID
NUMBER OF UNITS IN HIDDEN LAYER
Networks without hidden units: 0, networks with hidden units: 2, 3, ... Must be non-negative integer.


LETIN
INPUT ALPHABET
The one letter symbols for the amino acid or nucleotide alphabet. Symbol 21 (or 5) interpreted as blank if NIALPH=20, or NIALPH=4, respectively.


LETOUT
OUTPUT ALPHABET
This alphabet represents the output categories, for example H for helix, E for extended sheet and so on. All output category assignments in the sequence data file not mentioned in the output alphabet are converted into the last category. For an output alphabet comprising 3 letters, for example HEC, amino acids assigned to the turn category, T, will be converted into coil, C.


ETA
LEARNING RATE
The learning rate used for backpropagation online learning. Typical values from 0.01 to 0.5.


ICETA
SEPARATE LEARNING RATE FOR EACH CATEGORY
Flag enabling category specific learning rates when positive. Must be non-zero integer.


LSTOP
MAXIMAL NUMBER OF TRAINING SWEEPS
The number of times the training set is repeated (and presented in random order) to the network.


ITEST
TEST FREQUENCY
The number of full training sweeps between the evaluation of the training and test set performance values. If unity the network will be evaluated after each single training cycle. Must be non-zero integer.


IVIRGN
INITIALIZATION OF NETWORK WEIGHTS
If positive, weights are generated using a random number generator prior to the start of training. If negative, weights are read from a file using the SYNFIL parameter. Must be non-zero integer.


ISSEED
SYNAPS INITIALISATION SEED
Seed for the random number generator producing weights when IVIRGN is positive. Use large uneven integer.


LEARNC
NUMBER OF SEQUENCES IN TRAINING FILE (view the graph)
The number of sequences to be included in the training set. Must be non-negative integer.


LSKIP
NUMBER OF SEQUENCES TO SKIP IN TRAINING FILE (view the graph)
If non-zero, the first LSKIP sequences are skipped from the training data. Normally used when the training and test data is kept in the same file. Must be non- negative integer.


TESTC
NUMBER OF SEQUENCES IN TEST FILE (view the graph)
Must be non-negative integer.


ITSKIP
NUMBER OF SEQUENCES TO SKIP IN TEST FILE (view the graph)
Must be non-negative integer.


ICPER
PERCENTAGE OUTPUT FOR EVERY EXAMPLE
If positive, the test performance on the training and test sets will be computed and shown for each single sequence. If negative, sample performance values will be shown only. Must be non-zero integer.


ICSEQ
SEQUENCE OUTPUT FOR EVERY EXAMPLE
If positive, the network output will be shown for each single sequence in a format displaying the input sequence, the target output, and the actualprediction output using the symbols from the input and output alphabets. If negative, performance statistics will be produced in numeric form only. Must be non-zero integer.


IACTIV
SINGLE WINDOW OUTPUT ACTIVITIES, TEST ONLY
If positive, the actual real-numbered activities of the output units will be shown for each input symbol (or window) in the test part of the data. Must be non-zero integer.


AUTHOR

Søren Brunak, brunak@cbs.dtu.dk