Perl - Do's and Don't's


File Reading

When you read a file, you are usually faced with two choices:
1) Read the whole file into memory and then start your computations.
# Usual way of doing this is
open(IN, "somefile") or die "Can't read file\n";
@filearray = <IN>;
close IN;
# Compute here
2) Read the file line-by-line, retaining only the last line read, doing computations underway.
# Usual way of doing this is
open(IN, "somefile") or die "Can't read file\n";
while (defined ($line = <IN>)) {
    # Compute here
}
close IN;
Both methods are valid under the right circumstances. However method 2 is the preferred method, because it gives you the option of reading files larger than the computers memory (RAM), simply because you can discard a lot of the file underway in your reading. This is important today, where files containing bioinformatic data easily are in the Gigabyte range.

When to use method 1: Read whole file at once
The problem requires that you need to 'randomly' access (lines in) the whole file. Example: exercise 6.4.
When your alternative to reading the whole file into memory is reading it more than one time.

When to use method 2: Read file line by line
Every time you don't absolutely have to use method 1 :-)
Most file parsing (reading the file, looking for specific information) can be done line by line. Sometimes the file consists of records/entries. Those can be read record by record on a line by line basis.


File Parsing

To parse a file means to read the file looking for specific information.
A file often consists of a number of logical records - structures (with information) that is repeated in the file. The structures are identical (or similar), but the information is unique for each record.
When parsing a file (with one or more records) you should FIRST read the file/record, extracting the wanted information into variables, THEN you should print/save the data freeing up the variables. If you save data as you find it in the file/record, then you severely limit yourself in formatting the output.
More will be added as you inspire me.

This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk


GOTO Home Page Email Webmaster