Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Comparative Microbial Genomics - #27644

Computer Exercises - Prediction of Highly Expressed Genes in Bacterial Genomes

Comparative Microbial Genomics - Exercise 4

Wednesday, 28. September 2005



This exercise is meant to expose you to simple command-line techniques to answer specific questions on information stored within data files. These files are the outputs of prediction programmes (written in C, Perl, and various shell scripts) and use the Gnu 'make' system.


In order to answer the following questions, you will need to enter commands via a terminal. These commands will be in *nix syntax, using either gawk or standard command-line tools (found on any *nix system).



Step 1

Login to the "Genome" Server (replace 'XX' with your given number) and enter today's exercise directory:

% ssh -X micXX@genome.cbs.dtu.dk

% cd 28Sep05


You should have two files in this directory (these are "GenBank:" files):

1. Ecoli_K-12_MG1655_Main.gbk

2. Ecoli_O157_EDL93_Main.gbk


Step 2

Run the Travers prediction scripts on the Escherichia coli K-12_MG1655 GenBank file (take note of the 'average' and 'stdev' values):

% gmake Ecoli_K-12_MG1655_Main.top.travers


Step 3

Run the CAI prediction scripts on the Escherichia coli K-12_MG1655 GenBank file (take note of the 'AVG' and 'STDDEV' values):

% gmake Ecoli_K-12_MG1655_Main.top.cai


Question 1

How many genes are annotated in the Escherichia coli K-12_MG1655 genome?

% gmake Ecoli_K-12_MG1655_Main.ngenes

% cat Ecoli_K-12_MG1655_Main.ngenes


Question 2

How many Travers predictions are <= 1,5 stdev from the average (replace 'x.xx' with actual value)?

% sort -rn Ecoli_K-12_MG1655_Main.top.travers | gawk '{if($1 <= x.xx) {print $1}}' | wc -l


Question 3

How many CAI predictions are >= 1,5 stdev from the average (replace 'x.xx' with actual value)?

% sort -rn Ecoli_K-12_MG1655_Main.top.cai | gawk '{if($1 >= x.xx) {print $1}}' | wc -l


Question 4

How many genes overlap in both methods (i.e. How many genes around found in both Travers and CAI)?


% gawk '{print $2}' Ecoli_K-12_MG1655_Main.top.travers | sort > Ecoli_K-12_MG1655_Main.top.travers.col2


% gawk '{print $2}' Ecoli_K-12_MG1655_Main.top.cai | sort >


Ecoli_K-12_MG1655_Main.top.cai.col2


% comm -12 Ecoli_K-12_MG1655_Main.top.travers.col2


Ecoli_K-12_MG1655_Main.top.cai.col2 | wc -l


Question 5

How many genes are unique to Travers?


% comm -23 Ecoli_K-12_MG1655_Main.top.travers.col2


Ecoli_K-12_MG1655_Main.top.cai.col2 | wc -l


Question 6

How many genes are unique to CAI?


% comm -13 Ecoli_K-12_MG1655_Main.top.travers.col2


Ecoli_K-12_MG1655_Main.top.cai.col2 | wc -l


Question 7

Now, repeat the same process as in questions 1-6 for the Escherichia coli O157_EDL93 genome and answer the same questions.


Question 8

How many genes for each prediction method are unique to E. coli K-12_MG1655 (report both values)?


Question 9

How many genes for each prediction method are unique to E. coli O157_EDL93 (report both values)?


Question 10

How many genes overlap in both methods and in both genomes (i.e. How many genes around found in both Travers and CAI and in both E. coli K-12_MG1655 and E. coli O157_EDL93) (report both values)?






Course Organiser: David W. Ussery  Software questions: Christoph Champ