|
Comparative Microbial Genomics - Exercise 5
Wednesday, 26. October 2005
In computer exercise 4 we introduced some basic techniques to work in the Unix
invironment and you should now be familiar with navigating through a directory structure and
viewing/modifing files.
Today your task will be to explore conservation in bacterial ribosomal RNA by generating multiple alignments.
Based on sequence alignment you will be able to derive sequence similarity and later to draw phylogenetic trees
to compare bacterial genomes.
A current research project in the Comparative Microbial genomics group deals with accurate and fast prediction of rRNA genes. This exerciese will
briefly demonstrate the function of the preliminary version of our prediction tool - RNAmmer.
BACKGROUND
Most other methods for predicting rRNA uses BLAST. The disadvantage of BLAST is that it is a linear method that weights each position in the alignment equally. However not every position of the rRNA molecule is equally conserved and RNAmmer
therefor use profile hidden markov models which are build from structural alignments of thousands of rRNA sequences. These models will recognize conserved
regions and weight each conserved loci differently. To learn more about profile HMMs, visit hmmer.wustl.edu
Step 1 - Predicting rRNA
Like last times exercise we use the GNU make system and all predictions using RNAmmer is incorporated into Makefile's.
We have provided you with a directory of genome sequences of the following species:
| Segmentid | Species |
| Banthracis_Ames0581_Main | Bacillus anthracis Ames0581 |
| Banthracis_Sterne_Main | Bacillus anthracis Sterne |
| Cjejuni_RM1221_Main | Campylobacter jejuni RM1221 |
| Cperfringens_13_Main | Clostridium perfringens 13 |
| Ecoli_K-12_MG1655_Main | Escherichia coli K-12_MG1655 |
| Ecoli_K-12_W3110_Main | Escherichia coli K-12_W3110 |
| Psyringae_B728a_Main | Pseudomonas syringae B728a |
| Psyringae_DC3000_Main | Pseudomonas syringae DC3000 |
| Saureus_COL_Main | Staphylococcus aureus COL |
| Saureus_MRSA252_Main | Staphylococcus aureus MRSA252 |
| Saureus_MW2_Main | Staphylococcus aureus MW2 |
| Spneumoniae_R6_Main | Streptococcus pneumoniae R6 |
| Sthermophilus_LMG18311_Main | Streptococcus thermophilus LMG18311 |
| A | Unknown bacteria |
| B | Unknown bacteria |
| C | Unknown bacteria |
| D | Unknown bacteria |
- Login to the "genome.cbs.dtu.dk" Server via your SSH client and enter today's exercise directory:
> cd 26Oct2005
> ls -ltr
- You now have to log in to a bigger computer to run rRNA predictions. Type yes if you are prompted with a authenticity question
> ssh -X ibiology.cbs.dtu.dk
> cd 26Oct2005
- Start the prediction of all 16s rRNA genes. It will roughly take 1 cup of coffee to finish
> gmake -j Banthracis_Ames0581_Main.rnammer.ssu.fsa Saureus_COL_Main.rnammer.ssu.fsa Banthracis_Sterne_Main.rnammer.ssu.fsa Saureus_MRSA252_Main.rnammer.ssu.fsa Cjejuni_RM1221_Main.rnammer.ssu.fsa Saureus_MW2_Main.rnammer.ssu.fsa Cperfringens_13_Main.rnammer.ssu.fsa Spneumoniae_R6_Main.rnammer.ssu.fsa Ecoli_K-12_MG1655_Main.rnammer.ssu.fsa Ecoli_K-12_W3110_Main.rnammer.ssu.fsa Sthermophilus_LMG18311_Main.rnammer.ssu.fsa Psyringae_B728a_Main.rnammer.ssu.fsa Psyringae_DC3000_Main.rnammer.ssu.fsa A.rnammer.ssu.fsa B.rnammer.ssu.fsa C.rnammer.ssu.fsa D.rnammer.ssu.fsa
- Since multiple rRNA operaons exist in many of the genome sequences you should get only the first of each genome.
Each gene from the genome sequences are concatenated into file all.fsa:
> foreach i ( Banthracis_Ames0581_Main Saureus_COL_Main Banthracis_Sterne_Main Saureus_MRSA252_Main Cjejuni_RM1221_Main Saureus_MW2_Main Cperfringens_13_Main Spneumoniae_R6_Main Ecoli_K-12_MG1655_Main Ecoli_K-12_W3110_Main Sthermophilus_LMG18311_Main Psyringae_B728a_Main Psyringae_DC3000_Main A B C D)
cat $i.rnammer.ssu.fsa | saco_convert -I fasta -O tab | head -1 | saco_convert -I tab -O fasta | sed "s/>.*/>$i/g" >> all.fsa
end
- Produce a multiple alignment using clustalw (command-line version of clustalx)
> clustalw all.fsa
- Exit your session at ibiology.cbs.dtu.dk and return to genome.cbs.dtu.dk
> exit
- Examine the all.fsa file and the clustal alignment. Does the alignment look acceptable?
> n all.fsa
> clustalx all.aln
- In clustalx produce a tree by clicking Trees > Draw N-J tree (neighbor joining tree) and click ok (the tree is saved in phylip format under all.ph) - press OK. Examine the tree using njplot
> njplot all.ph
Genomes A, B, C, and D are among the organisms you see in the table below. Using the neighbor joining tree, find the identity of A, B, C, and D.
| Segmentid | Species |
| Bcereus_ATCC14579_Main | Bacillus cereus ATCC14579 |
| Cbotulinum_ATCC3502_Main | Clostridium botulinum ATCC3502 |
| Cjejuni_NCTC11168_Main | Campylobacter jejuni NCTC11168 |
| Ecoli_O157_EDL93_Main | Escherichia coli O157_EDL93 |
| Saureus_MSSA476_Main | Staphylococcus aureus MSSA476 |
| Senterica_ATCC9150_Main | Salmonella enterica ATCC9150 |
| Spneumoniae_TIGR4_Main | Streptococcus pneumoniae TIGR4 |
| Styphimurium_LT2_Main | Salmonella typhimurium LT2 |
- Comment on the lengths from A, B, C, and D to nearest branch-point - explain differences if you see any
|