Required reading
Learning Perl, ed. 4:
Chapter 5; p. 78-83
Chapter 12; p. 164, 169-178
or
Learning Perl, ed. 5:
Chapter 5; p. 71-73 mid, 76-79 mid, 81 mid-89 mid
Chapter 13; p. 191, 196-198 mid, 203 mid-205 mid
or
Learning Perl, ed. 6:
Chapter 5; p. 81-83 mid, 89-90 mid, 91 mid-99 top
Chapter 13; p. 215, 221-224 mid, 229-230
Notes (from pws); functions open, close, die,
exit, substr, length, mkdir, rmdir,
chdir, rename, unlink, chmod, system
and last chapter about perl structure/syntax p. 48-51
Subjects covered
- Using files.
- open which opens a file (makes it ready) for reading or writing,
- close which ends the reading/writing,
- Functions:
- die and exit terminates the program,
- substr which finds a part of a string,
- length, which tells how long a string is.
- mkdir, which creates directories
- rmdir, which removes empty directories
- chdir, which go to the given directory
- rename, which renames or moves files/directories
- unlink, which deletes files
- chmod, which changes permissions on files/directories
- system, which submits jobs to the operating system
- finally backticks/backquotes which does the same but retains the output from the job
Necessary files to complete this exercise
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
We have reused some of the files from earlier exercises.
ex1.dat
ex1.acc
dna.dat
dna.fsa
orphans.sp
You can play around with these files as much as you like. If you change or
destroy them, just download them again.
Remember to write #!/usr/bin/perl on the first line of
your programs.
All the following exercises have to be done in Perl
- Write a program that reads the file ex1.acc
and displays it on the screen. A bit like cat.
- Make the program ask for a filename, and display the file on the screen.
If the file does not exist, then complain and exit. More like cat.
- Make a new program thats ask for two filenames, one at a time.
It should then display (output/print) on one line the first line from the
first file concatenated (with tab) with the first line from the second file,
and so forth. The output should be like that of the UNIX command paste.
Try with the files ex1.acc and ex1.dat and compare with
the same UNIX paste command, exercise 1.14. It should be the same.
- In the file dna.dat is some human DNA.
This and the following exercise aims to make the reverse complement string
(called "complement strand") of DNA.
Read the file and put all the DNA in one variable.
Now complement the DNA in an other variable.
Complementing means changing all A's to T's, T's to A's, C's to G's and G's to C's.
Display and ensure that it works.
- Now reverse the DNA after complementing it. Reverse means last letter (base)
should be the first, next to last should be the second, and so forth. Display.
- Now write the DNA in the file revdna.dat. Make it look nice, just
like dna.dat, ie 60 letters per line.
- In the file dna.fsa is the same human DNA in FASTA format.
This format is VERY often used in bioinformatics. Look at it using less
and get used to the format. Observe the first line which starts with a >
and identifies the sequence. The name (AB000410 in this case) MUST
uniquely identify a sequence in the file. This is a DNA (actually mRNA) sequence taken from
the GenBank database. Now make a program that reverse complements the sequence
and writes it into the file revdna.fsa just like you did in previous
assignments. This time you have to keep the first identifying line, so the
sequence can be identified. You must add 'ComplementStrand' in the end
of that line, though, so you later know that it is the complement strand.
Summary: Keep the first line and reverse complement the sequence.
- Now you must analyse the AT/GC content of the DNA in the file dna.fsa.
You must count all A, T, C and G, and display the result.
- Read the file orphans.sp and find (extract) all the accession
numbers from it. Save them in another file of your choosing. Hint: Save
them as you find them, but only the accession number,
not the complete line. The first accession number is AB000114.CDS.1. You can choose
to consider .CDS.1 as a part of the accession number (or not). Accession numbers
differ in length for historical reasons.
- Read the file ex1.dat once and count the number of negative
numbers in it. Display the result.
- Now for some playing. Make a program that guesses a number between 1 and 10
that you think of. It should make a guess, and you shall answer yes
(if correctly guessed), higher (if the number you think of is higher than the quess)
or lower (if the number you think of is lower than the quess).
The program ends when the number is guessed correctly, otherwise it tries again.
It is NOT considered OK to guess at a number more than once, i.e. no repeats.
|