Required reading
Learning Perl, ed. 4:
Chapter 3; p. 38-48, middle
Chapter 9; p. 124-126
or
Learning Perl, ed. 5:
Chapter 3; p. 39-553
Chapter 9; p. 138-139
or
Learning Perl, ed. 6:
Chapter 3; p. 43-55
Chapter 9; p. 159-160
Notes on sort, reverse, push, pop, shift,
unshift, scalar, split, join and splice.
Subjects covered
Arrays (lists of data) and some functions that operate on arrays;
- sort, which surprisingly sorts the array according to some principle,
- reverse, which reverses an array (and a string),
- push, pop, shift, unshift, which all adds or remove elements from the array in one end or the other,
- scalar, which returns the length of an array.
- split, which splits a sting into an array.
- join, which concatenates an array into a string.
- splice, which takes out parts of an array.
New perl loops are foreach,
which loops over each element in an array, and for, which loops
a certain number of times - usually used as a counting loop.
@ARGV is the argument vector, that is the array where command line
parameters are stored.
From now on 2 point will be subtracted for each solution, that does not "use strict;" or
use proper consistent indentation (to a max of 4 point per exercise).
Necessary files to complete this exercise
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
ex5.acc
You can play around with these files as much as you like. If you change or
destroy them, just download them again.
Remember to write #!/usr/bin/perl -w on the first line of
your programs.
All the following exercises have to be done in Perl
- Make a program that ask for words and saves them in a file called
words.txt (one word per line) until you write STOP.
- Make a program that reads all the words in words.txt in an array.
First the words must be sorted alphabetically, then the list should be reversed
(the first line shall be the last and vice versa), finally the resulting list
should be written back in words.txt.
- In the file ex5.acc there are 6461 unique GenBank accession numbers
(taken from HU6800 DNA array chip).
An inexperienced bioinformatician unfortunately fouled up the list, so many
of the accession numbers appears more than once. It is your job to clean
the list, so all accession numbers only appear once, and in alphabetical order.
Save the new list in clean.acc. Hint: After sorting a list, duplicates
are "next" to each other, thereby making them easy to find and
eliminate. You are NOT to use splice in this exercise
- Improve/change the previous exercise by using splice to eliminate duplicates.
HINT: Keep one list and splice duplicates out of it instead of pushing them into a new list.
- Searching for accession numbers. Make a program that first reads your
file clean.acc, and then asks for accession numbers and checks if they
are in the list. If in the list the program should tell you, but it
should also tell you if it is not in the list. The program should continue to
ask until you write STOP. The search method you should employ is linear search; this is
simply searching the list from one end to the other one accession number at a time.
Linear search is always used when you don't know where the element you are searching for is placed
in the list.
- After having looked at the list in clean.acc, you discover that the accession numbers
are sorted. This means that you can use the much more powerful binary search method.
Repeat the previous exercise, but this time use binary search.
See what Wikipedia has to say about
binary search.
- It is time to improve on some of your old programs by adding
a command line interface (not replacing the interactive interface you already have).
This simply means that you can write ProgramName <FileName>
(or something like it) on the command line and your program should then use the argument
that you have supplied on the command line (in this case:
<FileName>). It should only ask for a filename (or whatever) if no argument was specified.
Improve exercise 4.2 and 4.3
You are expected to provide such an
interface when relevant in future exercises
- Make a perl program that works a bit like unix cut. It should cut out the columns
that you specify in the order you specify on the command line from a tab-separated file.
Some examples could be perlcut.pl 2 ex1.dat which cuts out column 2 from ex1.dat or
perlcut.pl 3 1 ex1.dat which cuts out columns 3 and 1 from ex1.dat and displays them
in that order (which is different from the original).
- Improve exercise 3.4. Calculate the three sums of the three columns in
one reading of the file ex1.dat using split to separate the columns.
- Improve on the previous exercise by making the program work on any number
of columns and sum each column individually. You can assume that each row
(line) has the same number of columns in a file.
Extreme programming. This exercise is done in class in groups of two.
Improve the bullseye from lesson 3 in such a way that the program will ask
how large a diameter/radius and how many rings the bullseye should be
generated with. There must be NO LIMITS in the program on diameter and/or ring number.
|