DATADRIVEN NEURAL NETWORK PREDICTION
The exercise in datadriven neural networks concerns the prediction of
protein secondary structure from the linear sequence of amino acids.
The exercise is carried out using a simple neural network simulator -
howlite - and ready made sequence data constructed by conversion of
Brookhaven coordinate data into secondary structure assignments by the
Kabsch and Sander DSSP program.
- Take a look at the sequence data linking the linear sequence of
amino acids and the secondary structure assignments. Click or use the
jot editor (or another editor) to see the sequences in the file
Type for example:
In the exercise the aim is to produce a two category helix/non-helix
network using the first 75 sequences for training and the next 25
sequences for test.
- Make your self familiar with the neural network simulator and
especially the run-time parameter file how.dat, which is placed
in your directory ~/25thu/how.dat. Read first briefly the man page
for the howlite program by typing:
When you have checked the different run-time parameters, make a trial
training run by typing:
howlite < how.dat
The program will run for
50 epochs, and produce a trained neural network. At the end of the
training run the best training and test set performances will be
- Change the value of the ISSEED parameter in the how.dat file.
parameter will change the initialization of the random number
generator and make a distinct network. Choose a large uneven integer.
- Produce a learning curve by redirecting the output from a training
run to a file, for example:
howlite < how.dat > how.run
View the temporal evolution of the performance by the howplot script
howplot < how.run
and observe the training and
If you like, print the plots.
- Evaluate the performance variation when changing the window size
parameter NWSIZE in the how.dat file. Try for example windows of
5, 9, 13, 17 amino acids, and collect the maximal test set
performance for each network architecture.
- Make your contribution to a network ensemble by running a network
of NWSIZE 13, and N2HID 15
howlite < how.dat
When the network has
been made, change the value for the number of training parameters LEARNC
into 0, the value of the IVIRGN into -1, and the value of IACTIV into
1. This will make the network produce output for the test sequences
only, and show the actual network output from the two output units.
Dump the output into a file:
howlite < how.dat > how.ensemble
- Gather in plenum where the performance from the different groups
should be reported. The performance of the network ensemble based on
the 15 versions of the how.ensemble files should be evaluated.