Lesson 4: Perl I/O

Prev Index Next

Required reading
Learning Perl, ed. 4:
Chapter 5; p. 78-83
Chapter 12; p. 164, 169-178 or
Learning Perl, ed. 5:
Chapter 5; p. 71-73 mid, 76-79 mid, 81 mid-89 mid
Chapter 13; p. 191, 196-198 mid, 203 mid-205 mid
or
Learning Perl, ed. 6:
Chapter 5; p. 81-83 mid, 89-90 mid, 91 mid-99 top
Chapter 13; p. 215, 221-224 mid, 229-230

Notes (from pws); functions open, close, die, exit, substr, length, mkdir, rmdir, chdir, rename, unlink, chmod, system and last chapter about perl structure/syntax p. 48-51

Subjects covered

  • Using files.
    • open which opens a file (makes it ready) for reading or writing,
    • close which ends the reading/writing,
  • Functions:
    • die and exit terminates the program,
    • substr which finds a part of a string,
    • length, which tells how long a string is.
    • mkdir, which creates directories
    • rmdir, which removes empty directories
    • chdir, which go to the given directory
    • rename, which renames or moves files/directories
    • unlink, which deletes files
    • chmod, which changes permissions on files/directories
    • system, which submits jobs to the operating system
    • finally backticks/backquotes which does the same but retains the output from the job

    Necessary files to complete this exercise
    To download the files to your system, just press the Shift key while you left click on the blue link. Follow the instructions.
    We have reused some of the files from earlier exercises.
    ex1.dat
    ex1.acc
    dna.dat
    dna.fsa
    orphans.sp
    You can play around with these files as much as you like. If you change or destroy them, just download them again.

    Remember to write #!/usr/bin/perl on the first line of your programs.


    All the following exercises have to be done in Perl

    1. Write a program that reads the file ex1.acc and displays it on the screen. A bit like cat.
    2. Make the program ask for a filename, and display the file on the screen. If the file does not exist, then complain and exit. More like cat.
    3. Make a new program thats ask for two filenames, one at a time. It should then display (output/print) on one line the first line from the first file concatenated (with tab) with the first line from the second file, and so forth. The output should be like that of the UNIX command paste. Try with the files ex1.acc and ex1.dat and compare with the same UNIX paste command, exercise 1.14. It should be the same.
    4. In the file dna.dat is some human DNA. This and the following exercise aims to make the reverse complement string (called "complement strand") of DNA. Read the file and put all the DNA in one variable. Now complement the DNA in an other variable. Complementing means changing all A's to T's, T's to A's, C's to G's and G's to C's. Display and ensure that it works.
    5. Now reverse the DNA after complementing it. Reverse means last letter (base) should be the first, next to last should be the second, and so forth. Display.
    6. Now write the DNA in the file revdna.dat. Make it look nice, just like dna.dat, i.e. 60 letters per line. This does NOT mean that you should insert newlines in the variable containing your complement strand (contamination of clean data you possibly should use later in the program). It just means that DNA in the output file must have 60 chars per line, just as in the input file.
    7. In the file dna.fsa is the same human DNA in FASTA format. This format is VERY often used in bioinformatics. Look at it using less and get used to the format. Observe the first line which starts with a > and identifies the sequence. The name (AB000410 in this case) MUST uniquely identify a sequence in the file. This is a DNA (actually mRNA) sequence taken from the GenBank database. Now make a program that reverse complements the sequence and writes it into the file revdna.fsa just like you did in previous assignments. This time you have to keep the first identifying line, so the sequence can be identified. You must add 'ComplementStrand' in the end of that line, though, so you later know that it is the complement strand.
      Summary: Keep the first line and reverse complement the sequence.
    8. Now you must analyse the AT/GC content of the DNA in the file dna.fsa. You must count all A, T, C and G, and display the result.
    9. Read the file orphans.sp and find (extract) all the accession numbers from it. Save them in another file of your choosing. Hint: Save them as you find them, but only the accession number, not the complete line. The first accession number is AB000114.CDS.1. You can choose to consider .CDS.1 as a part of the accession number (or not). Accession numbers differ in length for historical reasons.
    10. Read the file ex1.dat once and count the number of negative numbers in it. Display the result.
    11. Now for some playing. Make a program that guesses a number between 1 and 10 that you think of. It should make a guess, and you shall answer yes (if correctly guessed), higher (if the number you think of is higher than the quess) or lower (if the number you think of is lower than the quess). The program ends when the number is guessed correctly, otherwise it tries again. It is NOT considered OK to guess at a number more than once, i.e. no repeats.

This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk