Exercises: Day 4


A program can only be executed, when it has execute permission: chmod 755 <filename>
Remember to write #!/usr/bin/perl on the first line of your programs.

Necessary files to complete these exercises
To download the files to your system, just press the Shift key while you left click on the blue link. Follow the instructions.
ex5.acc
matrix.dat
mat1.dat
mat2.dat
test1.dat
test2.dat
test3.dat
dna7.fsa
FastaParse.pm


  1. Make a subroutine that removes duplicates from a list. The list (array) has to be passed to the subroutine as a reference and the array must be cleaned "in place" thus using a minimum of memory. The subroutine should NOT return a value. Use it to improve/change exercise 2 from day 2.
  2. Time to combine the subroutines that remove duplicates; Make a subroutine that removes duplicates from a list. If the list is passed as an array, then behaviour should be like day 3, ex 7, i.e. return a clean list. If the list is passed as a reference to an array then behaviour should be like day 4, ex 1, i.e. clean the given array in place.
  3. Create a program that reads a tab separated file with numbers, matrix.dat ,(to be understood as a matrix) and stores the numbers in a matrix-like hash (keys are indices, .i.e. "$i,$j"). The program should be able to figure out how many rows and columns the matrix has. Having read the matrix from file it should now transpose it (rows to columns and columns to rows) using a subroutine like &transpose($rows, $columns, \%matrix). You have to make the subroutine, too. In the end print out the resulting matrix.
  4. Make a program that calculates the product of two matrices and prints it on STDOUT (the screen). The matrices are in the files mat1.dat and mat2.dat. Numbers in the files are tab separated.
    Advice: The program should have a subroutine that reads a matrix from a given file (to be used twice), a subroutine that calculates the product, and a sub that prints a matrix. This way ensures that your program is easy to changes to other forms of matrix calculations. Here are two links to the definition of matrix multiplication.
    http://www.mai.liu.se/~halun/matrix/matrix.html
    http://mathworld.wolfram.com/MatrixMultiplication.html

  5. In the file test1.dat is results from an experiment in the form
    AccessionNumber   Number Number Number ....
    .
    .
    In the files test2.dat and test3.dat are results from similar experiments but with a slightly different gene set. You want to average the numbers from all experiments for each acccession number. The output this therefore.
    AccessionNumber SingleAverageNumberOfAll3Experiments
    .
    .
  6. Now we should use some object orientated techniques. OO programming is very often used in modules. A module is a collection of subroutines which somebody benevolent has made available for your use. You can find many Perl modules at http://www.cpan.org/.
    For now start by saving the file FastaParse.pm in the directory where your program will be. This is a OO module, which I made for easy reading of fasta files. The first thing you should do would be reading the file. There is first a description of the module, then comes the code. You should not worry about the code, allthough it is good to learn from when you make your own modules. The important part is the synopsis (first in the file), which tells you how to use the module.
    First you should make a small program that proves that you have downloaded and placed the module in the correct place. It could be the program in the synopsis of the module. If it runs without errors, you are set.
    Your first Perl statement in a program that uses the module should be: use FastaParse; which loads the module. After that you can use the module as described. Notice the use of '->' to refer to methods and/or data encapsulated in the module.
  7. Use this module to parse/read the fasta file dna7.fsa and solve ex. 6 from day 2. I repeat the text of the exercise for convenience: Now make a program that reverse complements the sequence and writes it into the file revdna.fsa in fasta format. This time you have to keep the first identifying line, so the sequence can be identified. You must add 'ReverseComplement' in the end of that line, though, so you later know that it is the reverse complement.


This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk