Required reading
Learning Perl, ed. 4:
Chapter 6; p. 88-99
or
Learning Perl, ed. 5:
Chapter 6; p. 93-104 mid
or
Learning Perl, ed. 6:
Chapter 6; p. 107-119 mid
and
Notes about functions keys, values, exists, delete, each
Subjects covered
- Hashes, which are unordered tables of data.
- Functions relevant to hashes:
- keys, returns a table of keys in the hash,
- values, returns a table of values in the hash,
- exists predicate that determines if an element exists,
- delete, which deletes an element,
- each which iterates over all key/value pair in the hash.
Necessary files to complete this exercise.
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
start10.dat
res10.dat
ex5.acc
data1.gb
data2.gb
data3.gb
data4.gb
You can play around with these files as much as you like. If you change or
destroy them, just download them again.
Remember to write #!/usr/bin/perl -w on the first line of
your programs.
All the following exercises have to be done in Perl
- Create a hash where the keys are codons and the value are the
one-letter-code for the amino acids.
The hash will function as a look-up table.
You can find a list here.
- Use the hash from the previous exercise in a program, that translates
all the nucleotide fasta entries in dna7.fsa to amino acid sequence.
Save the results in a file aa7.fsa in fasta format. Remember to keep
the 'headlines' for each entry and add 'Amino Acid Sequence' to each of them.
The STOP codon is NOT a part of the amino acid sequence.
- You have made a program (let's call it the X-program),
which as input takes a file of accession numbers, start10.dat
and produces some output, which is in res10.dat.
Now you count the lines in your input file and your
output file and you discover that the line numbers do not match. Horror -
your program does not produce output for some input. Now the assignment is
to discover which accession numbers did not produce output. This can be done
in various ways, but now you have to use a hash (as look-up table). Print the
results.
- In the file ex5.acc are a lot of accession numbers,
where some are duplicates. Earlier we just removed the duplicates, now we should count them.
Make a program that reads the file once, and prints a list (or writes a file)
with the unique accession numbers and the number of occurrences in
the file. A line should look like this: AC24677 2, if this accession
occurs twice in ex5.acc.
- Building upon the previous exercise, now make the program print the list ordered by occurrences
of accession numbers. That means the accession numbers with most duplicates should be first, and
accession numbers which only occurs once should be last in the list.
- In the genbank files data?.gb you should extract the coding DNA
sequence as you already did in 7.9. Next you have to display a list of
codons USED in the coding sequence and the number of times they are used.
- In the data1.gb file there are 6 references (to articles). Make
a program that extracts all authors from the references, eliminates those
that are duplicates and print the list of persons who had anything to
do with this GenBank entry. This should also work for the other Genbank entries.
Beware: there traps in this exercise, check your output properly. You are free
to use hashes or not in this exercise.
|