Lesson 6: Perl Arrays

Prev Index Next

Required reading
Learning Perl, ed. 4:
Chapter 3; p. 38-48, middle
Chapter 9; p. 124-126 or
Learning Perl, ed. 5:
Chapter 3; p. 39-553
Chapter 9; p. 138-139 or
Learning Perl, ed. 6:
Chapter 3; p. 43-55
Chapter 9; p. 159-160

Notes on sort, reverse, push, pop, shift, unshift, scalar, split, join and splice.

Subjects covered
Arrays (lists of data) and some functions that operate on arrays;

  • sort, which surprisingly sorts the array according to some principle,
  • reverse, which reverses an array (and a string),
  • push, pop, shift, unshift, which all adds or remove elements from the array in one end or the other,
  • scalar, which returns the length of an array.
  • split, which splits a sting into an array.
  • join, which concatenates an array into a string.
  • splice, which takes out parts of an array.
New perl loops are foreach, which loops over each element in an array, and for, which loops a certain number of times - usually used as a counting loop. @ARGV is the argument vector, that is the array where command line parameters are stored.

From now on 2 point will be subtracted for each solution, that does not "use strict;" or use proper consistent indentation (to a max of 4 point per exercise).

Necessary files to complete this exercise
To download the files to your system, just press the Shift key while you left click on the blue link. Follow the instructions.
ex5.acc
You can play around with these files as much as you like. If you change or destroy them, just download them again.

Remember to write #!/usr/bin/perl -w on the first line of your programs.


All the following exercises have to be done in Perl

  1. Make a program that ask for words and saves them in a file called words.txt (one word per line) until you write STOP.
  2. Make a program that reads all the words in words.txt in an array. First the words must be sorted alphabetically, then the list should be reversed (the first line shall be the last and vice versa), finally the resulting list should be written back in words.txt.
  3. In the file ex5.acc there are 6461 unique GenBank accession numbers (taken from HU6800 DNA array chip). An inexperienced bioinformatician unfortunately fouled up the list, so many of the accession numbers appears more than once. It is your job to clean the list, so all accession numbers only appear once, and in alphabetical order. Save the new list in clean.acc. Hint: After sorting a list, duplicates are "next" to each other, thereby making them easy to find and eliminate. You are NOT to use splice in this exercise
  4. Improve/change the previous exercise by using splice to eliminate duplicates. HINT: Keep one list and splice duplicates out of it instead of pushing them into a new list.
  5. Searching for accession numbers. Make a program that first reads your file clean.acc, and then asks for accession numbers and checks if they are in the list. If in the list the program should tell you, but it should also tell you if it is not in the list. The program should continue to ask until you write STOP. The search method you should employ is linear search; this is simply searching the list from one end to the other one accession number at a time. Linear search is always used when you don't know where the element you are searching for is placed in the list.
  6. After having looked at the list in clean.acc, you discover that the accession numbers are sorted. This means that you can use the much more powerful binary search method. Repeat the previous exercise, but this time use binary search. See what Wikipedia has to say about binary search.
  7. It is time to improve on some of your old programs by adding a command line interface (not replacing the interactive interface you already have). This simply means that you can write ProgramName <FileName> (or something like it) on the command line and your program should then use the argument that you have supplied on the command line (in this case: <FileName>). It should only ask for a filename (or whatever) if no argument was specified. Improve exercise 4.2 and 4.3
    You are expected to provide such an interface when relevant in future exercises
  8. Make a perl program that works a bit like unix cut. Test it before you start so you know how the output looks. It should cut out the columns that you specify in the order you specify on the command line from a tab-separated file. Some examples could be perlcut.pl 2 ex1.dat which cuts out column 2 from ex1.dat or perlcut.pl 3 1 ex1.dat which cuts out columns 3 and 1 from ex1.dat and displays them in that order (which is different from the original).
  9. Improve exercise 3.4. Calculate the three sums of the three columns in one reading of the file ex1.dat using split to separate the columns.
  10. Improve on the previous exercise by making a program that calculates the sum of all columns in the file, no matter how many columns there are. Each column should be summed individually. You can assume that each row (line) has the same number of columns in the file.

Extreme programming. This exercise is done in class in groups of two. It is not to be handed in.
Improve the bullseye from lesson 3 in such a way that the program will ask how large a diameter/radius and how many rings the bullseye should be generated with. There must be NO LIMITS in the program on diameter and/or ring number.

This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk