Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Comparative Microbial Genomics - #27644

Computer Exercises - Phylogeny and prediction of ribosomal RNA

Comparative Microbial Genomics - Exercise 5

Wednesday, 26. October 2005



In computer exercise 4 we introduced some basic techniques to work in the Unix invironment and you should now be familiar with navigating through a directory structure and viewing/modifing files. Today your task will be to explore conservation in bacterial ribosomal RNA by generating multiple alignments. Based on sequence alignment you will be able to derive sequence similarity and later to draw phylogenetic trees to compare bacterial genomes.


A current research project in the Comparative Microbial genomics group deals with accurate and fast prediction of rRNA genes. This exerciese will briefly demonstrate the function of the preliminary version of our prediction tool - RNAmmer.

BACKGROUND
Most other methods for predicting rRNA uses BLAST. The disadvantage of BLAST is that it is a linear method that weights each position in the alignment equally. However not every position of the rRNA molecule is equally conserved and RNAmmer therefor use profile hidden markov models which are build from structural alignments of thousands of rRNA sequences. These models will recognize conserved regions and weight each conserved loci differently. To learn more about profile HMMs, visit hmmer.wustl.edu

Step 1 - Predicting rRNA Like last times exercise we use the GNU make system and all predictions using RNAmmer is incorporated into Makefile's.

We have provided you with a directory of genome sequences of the following species:
SegmentidSpecies
Banthracis_Ames0581_MainBacillus anthracis Ames0581
Banthracis_Sterne_MainBacillus anthracis Sterne
Cjejuni_RM1221_MainCampylobacter jejuni RM1221
Cperfringens_13_MainClostridium perfringens 13
Ecoli_K-12_MG1655_MainEscherichia coli K-12_MG1655
Ecoli_K-12_W3110_MainEscherichia coli K-12_W3110
Psyringae_B728a_MainPseudomonas syringae B728a
Psyringae_DC3000_MainPseudomonas syringae DC3000
Saureus_COL_MainStaphylococcus aureus COL
Saureus_MRSA252_MainStaphylococcus aureus MRSA252
Saureus_MW2_MainStaphylococcus aureus MW2
Spneumoniae_R6_MainStreptococcus pneumoniae R6
Sthermophilus_LMG18311_MainStreptococcus thermophilus LMG18311
AUnknown bacteria
BUnknown bacteria
CUnknown bacteria
DUnknown bacteria
  • Login to the "genome.cbs.dtu.dk" Server via your SSH client and enter today's exercise directory:
    > cd 26Oct2005
    > ls -ltr


  • You now have to log in to a bigger computer to run rRNA predictions. Type yes if you are prompted with a authenticity question
    > ssh -X ibiology.cbs.dtu.dk
    > cd 26Oct2005


  • Start the prediction of all 16s rRNA genes. It will roughly take 1 cup of coffee to finish
    > gmake -j Banthracis_Ames0581_Main.rnammer.ssu.fsa Saureus_COL_Main.rnammer.ssu.fsa Banthracis_Sterne_Main.rnammer.ssu.fsa Saureus_MRSA252_Main.rnammer.ssu.fsa Cjejuni_RM1221_Main.rnammer.ssu.fsa Saureus_MW2_Main.rnammer.ssu.fsa Cperfringens_13_Main.rnammer.ssu.fsa Spneumoniae_R6_Main.rnammer.ssu.fsa Ecoli_K-12_MG1655_Main.rnammer.ssu.fsa Ecoli_K-12_W3110_Main.rnammer.ssu.fsa Sthermophilus_LMG18311_Main.rnammer.ssu.fsa Psyringae_B728a_Main.rnammer.ssu.fsa Psyringae_DC3000_Main.rnammer.ssu.fsa A.rnammer.ssu.fsa B.rnammer.ssu.fsa C.rnammer.ssu.fsa D.rnammer.ssu.fsa


  • Since multiple rRNA operaons exist in many of the genome sequences you should get only the first of each genome. Each gene from the genome sequences are concatenated into file all.fsa:
    > foreach i ( Banthracis_Ames0581_Main Saureus_COL_Main Banthracis_Sterne_Main Saureus_MRSA252_Main Cjejuni_RM1221_Main Saureus_MW2_Main Cperfringens_13_Main Spneumoniae_R6_Main Ecoli_K-12_MG1655_Main Ecoli_K-12_W3110_Main Sthermophilus_LMG18311_Main Psyringae_B728a_Main Psyringae_DC3000_Main A B C D)
    cat $i.rnammer.ssu.fsa | saco_convert -I fasta -O tab | head -1 | saco_convert -I tab -O fasta | sed "s/>.*/>$i/g" >> all.fsa
    end


  • Produce a multiple alignment using clustalw (command-line version of clustalx)
    > clustalw all.fsa


  • Exit your session at ibiology.cbs.dtu.dk and return to genome.cbs.dtu.dk
    > exit


  • Examine the all.fsa file and the clustal alignment. Does the alignment look acceptable?
    > n all.fsa
    > clustalx all.aln


  • In clustalx produce a tree by clicking Trees > Draw N-J tree (neighbor joining tree) and click ok (the tree is saved in phylip format under all.ph) - press OK. Examine the tree using njplot
    > njplot all.ph


Genomes A, B, C, and D are among the organisms you see in the table below. Using the neighbor joining tree, find the identity of A, B, C, and D.

SegmentidSpecies
Bcereus_ATCC14579_MainBacillus cereus ATCC14579
Cbotulinum_ATCC3502_MainClostridium botulinum ATCC3502
Cjejuni_NCTC11168_MainCampylobacter jejuni NCTC11168
Ecoli_O157_EDL93_MainEscherichia coli O157_EDL93
Saureus_MSSA476_MainStaphylococcus aureus MSSA476
Senterica_ATCC9150_MainSalmonella enterica ATCC9150
Spneumoniae_TIGR4_MainStreptococcus pneumoniae TIGR4
Styphimurium_LT2_MainSalmonella typhimurium LT2

  • Comment on the lengths from A, B, C, and D to nearest branch-point - explain differences if you see any




Course Organiser: David W. Ussery  Software questions: Peter Fischer Hallin