Required reading
Learning Perl, ed. 4:
Chapter 4; p. 54-67
or
Learning Perl, ed. 5:
Chapter 4; p. 55-67
or
Learning Perl, ed. 6:
Chapter 4; p. 63-71
Subjects covered
- Subroutines: a subroutine is both a way of hiding complexity and a way
of reusing code.
- Scope of variables.
Necessary files to complete this exercise
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
dna7.fsa
ex5.acc
dna-array.dat
All the following exercises have to be done in Perl
- Make a subroutine that take a DNA sequence (string) as
parameter and return the complement strand (reverse complement).
Use it to improve 7.3 which works on dna7.fsa.
- Improve on 2.10 by making a subroutine that calculates the factorial.
Add some input control (make sure you get positive integers, when you
ask for a number).
- Make a subroutine, that returns the relevant one-letter
designation for the correct ammino acid, when you give it a codon (3 bases).
You can find a list here.
If something invalid is given as input to the subroutine, return an error code (make one up).
You can reuse the hash from last lesson for most of the exercise,
if you want.
- Make a subroutine that (only) removes duplicates from a list and returns the
clean list. Use it to improve 6.3.
- You do not need to use subroutines in this exercise.
Study the file DNA-array.dat a bit. This is real DNA array data
taken from a number of persons, some controls and some suffering from
colon cancer. If you look at the second line there is a lot of 0 and 1.
A '0' means that values in that column are from a cancer patient and a '1' means
data are from a control (healty person). The data are all log(intensity), i.e. the
logarithm of the mesured intensity of the relevant spot on the dna-chip.
The data in this file will be used in comming exercises.
Oh, yes - the data/columns are tab separated. The second item on each line
is the accession number for that particular gene.
Now make a program that extracts data from dna-array.dat.
It shall ask for an accession number (unless you have given it on the command line).
Make sure your program handles both situations. Then it shall search in the file
for the data concerning that accession number. If it does not find it (you gave a
wrong accession no), complain and stop. Otherwise it shall display the data
in two tab separated columns. First column shall be the data from the cancer
patients, second column for the controls. And yes, there are not the same number
of sick and healthy people - be able to handle that.
|