Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

DATADRIVEN NEURAL NETWORK PREDICTION


The exercise in datadriven neural networks concerns the prediction of protein secondary structure from the linear sequence of amino acids. The exercise is carried out using a simple neural network simulator - howlite - and ready made sequence data constructed by conversion of Brookhaven coordinate data into secondary structure assignments by the Kabsch and Sander DSSP program.

  1. Take a look at the sequence data linking the linear sequence of amino acids and the secondary structure assignments. Click or use the jot editor (or another editor) to see the sequences in the file

    /usr/cbs/phdcourse/neuralnets/aa2dssp.seq

    Type for example:

    jot /usr/cbs/phdcourse/neuralnets/aa2dssp.seq

    In the exercise the aim is to produce a two category helix/non-helix network using the first 75 sequences for training and the next 25 sequences for test.

  2. Make your self familiar with the neural network simulator and especially the run-time parameter file how.dat, which is placed in your directory ~/25thu/how.dat. Read first briefly the man page for the howlite program by typing:

    man howlite

    When you have checked the different run-time parameters, make a trial training run by typing:

    howlite < how.dat

    The program will run for 50 epochs, and produce a trained neural network. At the end of the training run the best training and test set performances will be reported.

  3. Change the value of the ISSEED parameter in the how.dat file. This parameter will change the initialization of the random number generator and make a distinct network. Choose a large uneven integer.

  4. Produce a learning curve by redirecting the output from a training run to a file, for example:

    howlite < how.dat > how.run

    View the temporal evolution of the performance by the howplot script by

    howplot < how.run

    and observe the training and test performance. If you like, print the plots.

  5. Evaluate the performance variation when changing the window size parameter NWSIZE in the how.dat file. Try for example windows of 5, 9, 13, 17 amino acids, and collect the maximal test set performance for each network architecture.

  6. Make your contribution to a network ensemble by running a network of NWSIZE 13, and N2HID 15

    howlite < how.dat

    When the network has been made, change the value for the number of training parameters LEARNC into 0, the value of the IVIRGN into -1, and the value of IACTIV into 1. This will make the network produce output for the test sequences only, and show the actual network output from the two output units. Dump the output into a file:

    howlite < how.dat > how.ensemble

  7. Gather in plenum where the performance from the different groups should be reported. The performance of the network ensemble based on the 15 versions of the how.ensemble files should be evaluated.