|
Artificial Neural Network
Description
Implement a simple artificial neural network algorithm with backpropagation
in perl.
ANNs are of great interest in bioinformatics. CBS has grown great on the strength of
our prediction servers, which utilises ANNs.
This is a part of a project at CBS, which is about prediction of whether certain variations of a
SNP will lead to a disease or not.
A lot of work has already gone into preparing a data set for network training. The resulting data sets can be seen below.
Building an ANN is very different from training an ANN. This project is about building (implementing) the ANN.
There are many considerations and tricks that are part of training the ANN, but the student should not focus
on these as they have nothing to do with perl.
It is probably a good idea to create one program, that does the training on the data set(s) and the saves the synapses
(weights) to a file, and an other program, that reads the synapses and makes a prediction on the given input (data set).
The training commences until the error is below a predetermined threshold or stops when max. training rounds
have been reached.
Input and output
The input for training is whitespace separated numbers. There are 27 input values and a target value (1=disease, 0=health).
0.075 0.3 0.075 0.225 -0.3 -0.075 0.15 -0.15 0 -0.3 -0.3 0.15 -0.3 -0.225 0 -0.15 -0.15 -0.3 -0.225 -0.3 -0.075 0.3 0.193746064573032135 1.011 0.7525 0.744312561819980218 0.612 0
-0.075 0.075 0.15 0.15 -0.375 -0.15 0.375 -0.15 -0.225 -0.375 -0.15 0.15 -0.3 -0.375 -0.3 -0.225 -0.225 -0.375 -0.3 -0.375 -0.225 0.15 0.173058043883721677 1.011 0.7525 0.744312561819980218 0.340 0
-0.3 -0.225 -0.15 -0.3 -0.45 -0.15 -0.225 -0.375 0.75 -0.375 -0.075 -0.225 -0.3 -0.3 -0.375 -0.3 -0.3 -0.375 -0.075 -0.375 -0.075 0.75 0.336279812283658667 1.011 0.7525 0.744312561819980218 0.100 1
-0.15 -0.3 0 -0.3 -0.225 0.075 -0.225 -0.375 0.6 -0.075 -0.15 -0.225 0 0.3 -0.225 -0.3 -0.3 -0.225 0.3 -0.3 -0.15 0.6 0.303728369615393505 1.011 0.7525 0.744312561819980218 0.212 1
There are several datasets, that can be combined for testing and training of the network:
homology_reduced_subset_1.howlin.gz
homology_reduced_subset_2.howlin.gz
homology_reduced_subset_3.howlin.gz
homology_reduced_subset_4.howlin.gz
The input for testing is similar to the training input, and the output should be a value
reflecting the networks opinion of the input :-)
Further information
This project is quite large and requires a student who is strong in math and theory.
http://en.wikipedia.org/wiki/Artificial_neural_network
http://en.wikipedia.org/wiki/Backpropagation
The easy reference: http://www.tek271.com/articles/neuralNet/IntoToNeuralNets.html
Additional posibilities
There a many other machine learning methods. The same project/data could be implemented with
Naive bayes classifier
Support vector machine
Random forest
or another method from Supervised learning
in the Machine learning field
|