Lesson 12: Seldom used Functions and Objects

Prev Index Next

Required reading
Learning Perl:
Just read the whole book.

Subjects covered
chr, returns a character given its position in the ascii table.
ord, returns a characters position in the ascii table.
index, finds the position of a sub string in a string.
rindex, finds the position of a sub string in a string starting from the end.
pos, returns the position of the last match in a string.
abs, returns the absolute value of a number.
cos, sin, returns the cosine/sine of a number.
exp, returns e to the power of number.
log, returns the natural logaritm of a number.
sqrt, returns the square root of a number.
rand, returns a random number.
srand, sets the seed for the random number generator.
grep, selects elements in array based on condition.
map, transforms array.
How to use other peoples objects.

Necessary files to complete this exercise.
To download the files to your system, just press the Shift key while you left click on the blue link. Follow the instructions.

All the following exercises have to be done in Perl.

  1. Calculate the the standard deviation (1.8355) of the numbers in ex1.dat. The formula leads to directly to a two-pass algorithm, where you will have store all the numbers in memory in order to calculate SD. The inspired programmer will find a one-pass algorithm, where you calulate SD just by looking at a number once, thereby not using significant memory. The genius will explain why there is a difference between the two results.
  2. Now we should use some object orientated techniques. OO programming is very often used in modules. A module is a collection of subroutines which somebody benevolent has made available for your use. You can find many Perl modules at http://www.cpan.org/. You can - when you become a good Perl programmer - contribute to CPAN (ah, yes - dreams).
    For now start by saving the file FastaParse.pm in the directory where your program will be. This is a OO module, which I made for easy reading of fasta files. The first thing you should do would be reading the file. There is first a description of the module, then comes the code. You should not worry about the code, allthough it is good to learn from when you make your own modules. The important part is the synopsis (first in the file), which tells you how to use the module.
    You should make a small program that proves that you have downloaded and placed the module in the correct place. It could be the program in the synopsis of the module. If it runs without errors, you are set.
    Your first Perl statement in a program that uses the module should be: use FastaParse; which loads the module. After that you can use the module as described. Notice the use of '->' to refer to methods and/or data encapsulated in the module.
  3. Improve 7.3 by using this module to parse/read the fasta file dna7.fsa. I repeat the text of the exercise for convenience: Now make a program that reverse complements the sequence and writes it into the file revdna.fsa in fasta format. This time you have to keep the first identifying line, so the sequence can be identified. You must add 'ReverseComplement' in the end of that line, though, so you later know that it is the reverse complement.
  4. You do not need to use subroutines in this exercise. Study the file DNA-array.dat a bit. This is real DNA array data taken from a number of persons, some controls and some suffering from colon cancer. If you look at the second line there is a lot of 0 and 1. A '0' means that values in that column are from a cancer patient and a '1' means data are from a control (healty person). The data are all log(intensity), i.e. the logarithm of the mesured intensity of the relevant spot on the dna-chip. The data in this file will be used in comming exercises. Oh, yes - the data/columns are tab separated. The second item on each line is the accession number for that particular gene.
    Now make a program that extracts data from dna-array.dat. It shall ask for an accession number (unless you have given it on the command line). Make sure your program handles both situations. Then it shall search in the file for the data concerning that accession number. If it does not find it (you gave a wrong accession no), complain and stop. Otherwise it shall display the data in two tab separated columns. First column shall be the data from the cancer patients, second column for the controls. And yes, there are not the same number of sick and healthy people - be able to handle that.

This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk