Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Support vector machines. Introduction to WEKA

Morten Nielsen (mniel@cbs.dtu.dk)
Olivier Taboureau (otab@cbs.dtu.dk)

For the first (and only) time in the course you shall not be developing your own code, but rather use a program package called WEKA.

Details on the WEKA program packages can be found here Weka User-guide

You can either download and install WEKA on you own local machine, or run WEKA from one of the CBS server. If possible running WEKA on a local machine is preferable since it will run many times faster compared to running via the CBS servers.

You can download WEKA to your own local computer following this link WEKA Downloads.

If you decide to run WEKA from the CBS, you need be on the life server. This you do by first logging in to CBS using your student login stud0XX@login.cbs.dtu.dk, and next login to life using the command

ssh -Y stud0XX@life.cbs.dtu.dk

Now you are ready to play. First make a directory where to store the data for the SVM exercise

mkdir SVM

Next open and safe some data files to your working directory

herg_vol_5probes_201train.arff
herg_vol_5probes_49test.arff

Now type (maybe you need to specify a version number)

weka

That should open the GUI of weka.

Next go to "Applications -> Explorer". Open the file herg_vol_5probes_201train.arff.

How many class, attributes and instances do you have?

Go to: Classify -> Choose -> Function -> SMO -> SMO option -> more (for explanation).
Which kind of kernel function can you run in this version?

Select use training set. Use the default set up, click on more options and turn on the output prediction.

Click on start

What is the accuracy and MCC? Be careful these are not defined as such in WEKA. Try to figure out what there are called in weka. Which instances are wrongly predicted?

Play with different parameters and kernel to see if you can optimize the classification and if yes which one?

Now select Supplied test set, and upload the test data herg_vol_5probes_49test.arff. Use the optimal kernel function and parameters found using the training data, and press Start. What is the performance on the test data?

Now you are done.