Just read the whole book.
chr, returns a character given its position in the ascii table.
ord, returns a characters position in the ascii table.
index, finds the position of a sub string in a string.
rindex, finds the position of a sub string in a string starting from the end.
pos, returns the position of the last match in a string.
abs, returns the absolute value of a number.
cos, sin, returns the cosine/sine of a number.
exp, returns e to the power of number.
log, returns the natural logaritm of a number.
sqrt, returns the square root of a number.
rand, returns a random number.
srand, sets the seed for the random number generator.
How to use other peoples objects.
Necessary files to complete this exercise.
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
All the following exercises have to be done in Perl.
- Calculate the the standard deviation (1.8355) of the numbers in
ex1.dat. The formula leads to directly to a two-pass algorithm, where you will have
store all the numbers in memory in order to calculate SD. The inspired programmer will find
a one-pass algorithm, where you calulate SD just by looking at a number once, thereby not
using significant memory. The genius will explain why there is a difference between the two results.
- Now we should use some object orientated techniques. OO programming is
very often used in modules. A module is a collection of subroutines which
somebody benevolent has made available for your use. You can find many Perl
http://www.cpan.org/. You can - when you
become a good Perl programmer - contribute to CPAN (ah, yes - dreams).
For now start by saving the file FastaParse.pm in the directory where
your program will be. This is a OO module, which I made for easy reading of
fasta files. The first thing you
should do would be reading the file. There is first a description of the
module, then comes the code. You should not worry about the code,
allthough it is good to learn from when you make your own modules.
The important part is the synopsis (first in the file), which tells you
how to use the module.
You should make a small program that proves that you have downloaded
and placed the module in the correct place. It could be the program in the synopsis of the module. If it runs without errors, you are set.
Your first Perl statement in a program that uses the module should be: use FastaParse; which
loads the module.
After that you can use the module as described. Notice the use of '->'
to refer to methods and/or data encapsulated in the module.
- Improve 7.3 by using this module to parse/read the fasta file
dna7.fsa. I repeat the text of the exercise for convenience: Now make a program that reverse complements the sequence
and writes it into the file revdna.fsa in fasta format. This time you have to keep the first identifying line, so the
sequence can be identified. You must add 'ReverseComplement' in the end
of that line, though, so you later know that it is the reverse complement.
- You do not need to use subroutines in this exercise.
Study the file DNA-array.dat a bit. This is real DNA array data
taken from a number of persons, some controls and some suffering from
colon cancer. If you look at the second line there is a lot of 0 and 1.
A '0' means that values in that column are from a cancer patient and a '1' means
data are from a control (healty person). The data are all log(intensity), i.e. the
logarithm of the mesured intensity of the relevant spot on the dna-chip.
The data in this file will be used in comming exercises.
Oh, yes - the data/columns are tab separated. The second item on each line
is the accession number for that particular gene.
Now make a program that extracts data from dna-array.dat.
It shall ask for an accession number (unless you have given it on the command line).
Make sure your program handles both situations. Then it shall search in the file
for the data concerning that accession number. If it does not find it (you gave a
wrong accession no), complain and stop. Otherwise it shall display the data
in two tab separated columns. First column shall be the data from the cancer
patients, second column for the controls. And yes, there are not the same number
of sick and healthy people - be able to handle that.