File ReadingWhen you read a file, you are usually faced with two choices:1) Read the whole file into memory and then start your computations. # Usual way of doing this is open(IN, "somefile") or die "Can't read file\n"; @filearray = <IN>; close IN; # Compute here2) Read the file line-by-line, retaining only the last line read, doing computations underway.
# Usual way of doing this is
open(IN, "somefile") or die "Can't read file\n";
while (defined ($line = <IN>)) {
# Compute here
}
close IN;
Both methods are valid under the right circumstances. However method 2 is
the preferred method, because it gives you the option of reading files
larger than the computers memory (RAM), simply because you can discard a lot
of the file underway in your reading. This is important today, where files
containing bioinformatic data easily are in the Gigabyte range.
When to use method 1: Read whole file at once
When to use method 2: Read file line by line File ParsingTo parse a file means to read the file looking for specific information.A file often consists of a number of logical records - structures (with information) that is repeated in the file. The structures are identical (or similar), but the information is unique for each record. When parsing a file (with one or more records) you should FIRST read the file/record, extracting the wanted information into variables, THEN you should print/save the data freeing up the variables. If you save data as you find it in the file/record, then you severely limit yourself in formatting the output. More will be added as you inspire me. |