Perl - Do's and Don't's

File Reading

When you read a file, you are usually faced with two choices:
1) Read the whole file into memory and then start your computations.
# Usual way of doing this is
open(IN, "somefile") or die "Can't read file\n";
@filearray = <IN>;
close IN;
# Compute here
2) Read the file line-by-line, retaining only the last line read, doing computations underway.
# Usual way of doing this is
open(IN, "somefile") or die "Can't read file\n";
while (defined ($line = <IN>)) {
    # Compute here
close IN;
Both methods are valid under the right circumstances. However method 2 is the preferred method, because it gives you the option of reading files larger than the computers memory (RAM), simply because you can discard a lot of the file underway in your reading. This is important today, where files containing bioinformatic data easily are in the Gigabyte range.

When to use method 1: Read whole file at once
The problem requires that you need to 'randomly' access (lines in) the whole file. Example: exercise 6.4.
When your alternative to reading the whole file into memory is reading it more than one time.

When to use method 2: Read file line by line
Every time you don't absolutely have to use method 1 :-)
Most file parsing (reading the file, looking for specific information) can be done line by line. Sometimes the file consists of records/entries. Those can be read record by record on a line by line basis.

File Parsing

To parse a file means to read the file looking for specific information.
A file often consists of a number of logical records - structures (with information) that is repeated in the file. The structures are identical (or similar), but the information is unique for each record.
When parsing a file (with one or more records) you should FIRST read the file/record, extracting the wanted information into variables, THEN you should print/save the data freeing up the variables. If you save data as you find it in the file/record, then you severely limit yourself in formatting the output.
More will be added as you inspire me.

This page was last updated         by Peter Wad Sackett,

GOTO Home Page Email Webmaster