|
hoWWWlite - Instructions
DESCRIPTION
howlite program is an artificial neural network simulator
especially designed to handle symbol sequences. The input to the network takes
the form of single letter symbol strings - typically representing amino acids
or nucleotides in biological sequences - while symbols in the output represent
categories assigned to each single input symbol. The output categories will
most often symbolize structural or functional aspects of the monomers in the
linear input sequence.
The network architecture is limited to networks of the feed-forward type, with
maximally 1 hidden layer separating the input layer and the output layer. The
training principle is standard backpropagation.
The WWW based version of the program reads a number of
run-time parameters from an input form.
These parameters control the architecture of the neural
network, many aspects of the training process, the number of sequences used for
training and testing, file names for additional input and output produced by
the program, and the complexity and detail of the output delivered by the
program.
When started, the howlite program will first produce a listing of
all relevant parameters used, followed by statistics showing details of the
classification performance during the training process. The output of a
training run may subsequently be visualized with plots of various preformance
measures. A trained network may also be used - in a static mode without further
training - to produce predicted classification results for novel sequences not
previously shown to the network.
The essential data for the program is:
(a) the set of input parameter values,
(b) the sequence files holding training and test data,
(c) the weight file holding the adjustable neural network parameters.
The sources from which these data are taken can be controlled by the user
via a WWW browser.
ESSENTIAL RUN-TIME PARAMETERS
All parameters are read by the program from an input form. Reasonable
default settings will be used if the user takes no action.
|
NIALPH
|
NUMBER OF LETTERS IN THE INPUT ALPHABET
Typically 21 or 20 for standard amino acid networks, 5
or 4 for nucleotide networks. Must be
integer.
|
NOALPH
|
NUMBER OF LETTERS IN THE OUTPUT ALPHABET
Equals the number of output categories. The minimum is
2. Must be integer.
|
NWSIZE
|
WINDOW SIZE IN LETTERS
The size of the symmetrical window surrounding the
central symbol. Must be uneven non-negative integer.
|
N2HID
|
NUMBER OF UNITS IN HIDDEN LAYER
Networks without hidden units: 0, networks with hidden
units: 2, 3, ... Must be non-negative integer.
|
LETIN
|
INPUT ALPHABET
The one letter symbols for the amino acid or nucleotide
alphabet. Symbol 21 (or 5) interpreted as blank if
NIALPH=20, or NIALPH=4,
respectively.
|
LETOUT
|
OUTPUT ALPHABET
This alphabet represents the output categories, for
example H for helix, E for extended sheet and so on.
All output category assignments in the sequence data
file not mentioned in the output alphabet are converted
into the last category. For an output alphabet
comprising 3 letters, for example HEC, amino acids
assigned to the turn category, T, will be converted
into coil, C.
|
ETA
|
LEARNING RATE
The learning rate used for backpropagation online
learning. Typical values from 0.01 to 0.5.
|
ICETA
|
SEPARATE LEARNING RATE FOR EACH CATEGORY
Flag enabling category specific learning rates when
positive. Must be non-zero integer.
|
LSTOP
|
MAXIMAL NUMBER OF TRAINING SWEEPS
The number of times the training set is repeated (and
presented in random order) to the network.
|
ITEST
|
TEST FREQUENCY
The number of full training sweeps between the
evaluation of the training and test set performance
values. If unity the network will be evaluated after
each single training cycle. Must be non-zero integer.
|
IVIRGN
|
INITIALIZATION OF NETWORK WEIGHTS
If positive, weights are generated using a random
number generator prior to the start of training. If
negative, weights are read from a file using the SYNFIL
parameter. Must be non-zero integer.
|
ISSEED
|
SYNAPS INITIALISATION SEED
Seed for the random number generator producing weights
when IVIRGN is positive.
Use large uneven integer.
|
LEARNC
|
NUMBER OF SEQUENCES IN TRAINING FILE
( view the graph)
The number of sequences to be included in the training
set. Must be non-negative integer.
|
LSKIP
|
NUMBER OF SEQUENCES TO SKIP IN TRAINING FILE
( view the graph)
If non-zero, the first LSKIP sequences are skipped from
the training data. Normally used when the training and
test data is kept in the same file. Must be non-
negative integer.
|
TESTC
|
NUMBER OF SEQUENCES IN TEST FILE
( view the graph)
Must be non-negative integer.
|
ITSKIP
|
NUMBER OF SEQUENCES TO SKIP IN TEST FILE
( view the graph)
Must be non-negative integer.
|
ICPER
|
PERCENTAGE OUTPUT FOR EVERY EXAMPLE
If positive, the test performance on the training and
test sets will be computed and shown for each single
sequence. If negative, sample performance values will
be shown only. Must be non-zero integer.
|
ICSEQ
|
SEQUENCE OUTPUT FOR EVERY EXAMPLE
If positive, the network output will be shown for each
single sequence in a format displaying the input
sequence, the target output, and the actualprediction
output using the symbols from the input and output
alphabets. If negative, performance statistics will be
produced in numeric form only. Must be non-zero
integer.
|
IACTIV
|
SINGLE WINDOW OUTPUT ACTIVITIES, TEST ONLY
If positive, the actual real-numbered activities of the
output units will be shown for each input symbol (or
window) in the test part of the data. Must be non-zero
integer.
| | | | | | | | | | | | | | | | | | |
AUTHORS
Søren Brunak,
brunak@cbs.dtu.dk
Henrik Nielsen,
hnielsen@cbs.dtu.dk
|