Required reading
Learning Perl, ed. 4:
Chapter 2; p. 18-36
Chapter 5; p. 76-77,82-84
or
Learning Perl, ed. 5:
Chapter 5; p. 79 mid-80
or
Learning Perl, ed. 6:
Nothing to read, really. Perhaps entire chapter 5, where you can see a lot of what
you can do, and a lot that you should not do, if you want to make clear code.
Notes (from pws); functions: printf, sprintf, uc, ucfirst,
lc, lcfirst.
Subjects covered
How to structure your code in smaller parts.
Finding bugs in your program, use strict; and -w.
Formatting output using printf which prints according to a format string, sprintf is
similar to printf except the result is returned as a string, uc which returns a
sting uppercased, ucfirst uppercases just the first letter, lc which returns a
sting lowercased, lcfirst lowercases just the first letter.
From now on 2 point will be subtracted for each solution, that does not "use strict;" or
use proper consistent indentation (to a max of 4 point per exercise).
Necessary files to complete this exercise
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
sprot.dat
sprot2.dat
sprot3.dat
sprot4.dat
dna.fsa
orphans.sp
You can play around with these files as much as you like. If you change or
destroy them, just download them again.
Remember to write #!/usr/bin/perl -w on the first line of
your programs.
All the following exercises have to be done in Perl
- This and the following 5 exercises deal with SwissProt.
The file sprot.dat is a SwissProt database entry. Study it with
less. Locate the SwissProt ID (SP96_DICDI),
the accession number (P14328) and the amino
acid sequence (MRVLLVLVAC....TTTATTTATS). There are other entries (
sprot2.dat, sprot3.dat, sprot4.dat). Your programs should work on
those, too. Also your programs must solve all the problems in ONE
reading of the file.
- Make a program that reads the ID and prints it.
- Add the following functionality to the program:
Read the accession number and print it.
- Add the following functionality to the program:
Read the amino acid sequence and print it.
- Add the following functionality to the program: Verification of amino acid
number. This means extract the number from the SQ line (example:
SQ SEQUENCE 629 AA;) and check that the amino acid sequence has that number
of residues.
- Now that you have the ID, accession number and AA sequence save it to a
file sprot.fsa in FASTA format. Look in the file dna.fsa
for an example of FASTA. Notice the first line starts with > and
immediately after comes an unique identifier, like an accession number
or a SwissProt ID. Any other data must be on the header line only, but in free format.
Sequence data is on the following lines.
Notice that this exercise incorporates the previous 5.
- In the file dna.fsa is some DNA. Construct a program that finds
possible translation starts :-)
All proteins start with the amino acid methionine (at least when translating,
Met might be removed in later processing states). Methionine is coded with ATG.
The exercise is therefore; find the position of all ATG's in the sequence.
The first position is 83 as humans count.
In some organisms different start codon are possible. If you really
want to, you can make the program handle those cases too.
- Assuming that the first Met at position 83 is translation start, find
the corresponding translation stop (which is the first one in frame).
Stop codon is coded by TAA, TAG, or TGA. Remember that
the stop codon has to be in the same reading frame as ATG.
See here for explanation
- Make a program that asks for an organism, like 'HUMAN' or 'RAT'.
The program should then count the number of lines/times a SwissProt identifier
in the file orphans.sp is present with said organism, ie. PARG_HUMAN and
LUM_HUMAN are the two first (but not last) for HUMAN.
- Playing time a again. Make the guessing program from last week
count how many attempts it needed to guess the
number and print it when done guessing. It must be able to detect if you lie
(and say so, of course). Also, if
you haven't done it before, make the program guess in the fewest
possible guesses (binary search for you experts out there).
|