|
Comparative Microbial Genomics - Exercise 4
Wednesday, 28. September 2005
This exercise is meant to expose you to simple command-line techniques to answer specific questions on information stored within data files. These files are the outputs of prediction programmes (written in C, Perl, and various shell scripts) and use the Gnu 'make' system.
In order to answer the following
questions, you will need to enter commands via a terminal. These
commands will be in *nix syntax, using either gawk or standard
command-line tools (found on any *nix system).
Step 1
Login to the "Genome" Server
(replace 'XX' with your given number) and enter today's exercise
directory:
%
ssh -X micXX@genome.cbs.dtu.dk
%
cd 28Sep05
You should have two files in this
directory (these are "GenBank:" files):
1.
Ecoli_K-12_MG1655_Main.gbk
2.
Ecoli_O157_EDL93_Main.gbk
Step 2
Run the Travers prediction scripts on
the Escherichia coli
K-12_MG1655 GenBank file (take note of the 'average' and 'stdev'
values):
%
gmake Ecoli_K-12_MG1655_Main.top.travers
Step 3
Run the CAI prediction scripts on the
Escherichia coli K-12_MG1655
GenBank file (take note of the 'AVG' and 'STDDEV' values):
%
gmake Ecoli_K-12_MG1655_Main.top.cai
Question 1
How many genes are annotated in the
Escherichia coli K-12_MG1655
genome?
%
gmake Ecoli_K-12_MG1655_Main.ngenes
%
cat Ecoli_K-12_MG1655_Main.ngenes
Question 2
How many Travers predictions are <= 1,5 stdev from the average (replace 'x.xx' with actual value)?
%
sort -rn Ecoli_K-12_MG1655_Main.top.travers
| gawk '{if($1 <= x.xx) {print $1}}' | wc -l
Question 3
How many CAI predictions are >= 1,5 stdev from the average (replace 'x.xx' with actual value)?
%
sort -rn Ecoli_K-12_MG1655_Main.top.cai
| gawk '{if($1 >= x.xx) {print $1}}' | wc -l
Question 4
How many genes overlap in both methods
(i.e. How many genes around found in both Travers and CAI)?
%
gawk '{print $2}' Ecoli_K-12_MG1655_Main.top.travers
| sort > Ecoli_K-12_MG1655_Main.top.travers.col2
%
gawk '{print $2}' Ecoli_K-12_MG1655_Main.top.cai
| sort >
Ecoli_K-12_MG1655_Main.top.cai.col2
%
comm -12 Ecoli_K-12_MG1655_Main.top.travers.col2
Ecoli_K-12_MG1655_Main.top.cai.col2
| wc -l
Question 5
How many genes are unique to Travers?
%
comm -23 Ecoli_K-12_MG1655_Main.top.travers.col2
Ecoli_K-12_MG1655_Main.top.cai.col2
| wc -l
Question 6
How many genes are unique to CAI?
%
comm -13 Ecoli_K-12_MG1655_Main.top.travers.col2
Ecoli_K-12_MG1655_Main.top.cai.col2
| wc -l
Question 7
Now, repeat the same process as in
questions 1-6 for the Escherichia coli
O157_EDL93 genome and answer the same questions.
Question 8
How many genes for
each prediction method are unique to E. coli K-12_MG1655
(report both values)?
Question 9
How many genes for
each prediction method are unique to E. coli O157_EDL93
(report both values)?
Question 10
How many genes
overlap in both methods and
in both genomes (i.e. How many genes around found in both
Travers and CAI and in
both E. coli K-12_MG1655 and E. coli O157_EDL93)
(report both values)?
|