Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other
CBS >> CBS Courses >> Comparative Microbial Genomics Analysis >> Day 2

Exercise 2: 16S rRNA trees

IMPORTANT: You must have completed last time's exercises before you start on today's work.

Some tools used today

There are some tools that we are using today that deserves a closer explanation.

  • Perl and python: perl and python are  languages which can be used to make everything from large computer programs to small utility programs. In this course we will often use perl and python scripts, which are small programs, to calculate things and to convert files.

  • make: gmake is a system which allows us to automate tasks. In many cases doing something involves several steps, such as converting files, calculating something and making a graphic file of it all. When using gmake, you specify what kind of file you want to end up with, and the coputer figures out what steps need to be done to get to that point. For instance. you have a genbank file and you only want the fasta sequence of the genome. The file is named "a.gbk", if you want to use gmake to get the fasta sequence, you type in "gmake a.fsa"., i.e., you replace the ending with what you want to end up with.

X client

You will be both looking at files and editing files today. This is why you need to get a a program which can transfer images from the CBS computers to your computer, and which will let you interact with it. A how-to guide can be found here:

How to get X up and working


Once you have got X working you can start to look at files, and to edit files. There are two programs that you will use to do this with.
  • nedit: this is a program that will let you edit a text file and save it. You can use the mouse to click inside of the window and to interact with the menues. 
  • ghostview: this is a program which will let you look at a postscript file. Postscript is actually a language for describing graphics which was developed for getting pictures out on a printer. Postscript files are acutally textfiles which you can look at and - strictly speaking - edit (not recommended unless you really know what you are doing). Ghostview takes the postscript and interprets how it would look on a page and displays it on your screen. 

Making a phylogenetic tree

Phylogenetic trees are often constructed from 16S genes. These are genes that code for a 16S, an rRNA that is found within the small subunit. This gene is present in all organisms, and has both conserved and variable regions which means that it can be used to establish the relationship between organisms.

In this exercise you will first predict 16S rRNA genes, make a multiple alignment from them, and then make a bootstrapped tree of them. This tree will let you see how your organisms are related.

Predicting 16S rRNA genes

In some cases the annotations of rRNAs can be inaccurate. Ribosomal RNAs can be predicted using a program called RNAmmer. This program can be either used on the web or through a web servics script that uses this program via the internet.
  • Log in to the CBS computers

    Find a program which will let you log into the CBS computers. 

    Computer name:
    User name: studXXX

    You will get your password from the teachers.

    After that you need to log into the computer where we will do the exercises, which is named ibiology.

    # log into CBS
    ssh -Y ibiology
    setenv MAKEFILES /home/people/pfh/bin/Makefile
    umask 022

  • Create a directory to store things in
    # create directory for holding the rRNA predictions, and go into that directory
    mkdir rRNA
    cd ~/rRNA
  • Predict rRNAs

    The RNAmmer program is a web service, like the atlas program and the prodigal program you used before. This means that you will be put in a queue, and you will see that for each of your organisms, there will be a procession of QUEUED, followed by ACTIVE, and then FINISHED.
    # get just the fasta sequence for these genomes and predict the rRNAs
    foreach i (../data/fsafiles/[A-Z]*.fasta)
    perl ~karinl/scripts/rnammer/ bac ssu < $i > $i:r.rrna.fsa
    cp $i:r.rrna.fsa .
  • Have a look at one of the files.
    # look at one of the files
    less <filename>.rrna.fsa 

    You should have a set of sequences in FASTA format in your file, each approximately 1500 nt long.

  • Extract one rRNA from each and put it into another file.

    You will need one 16S sequence for each organism. Many organisms contain many of them, some have upto 16 16S rRNAs. To get one from each of your organisms, you will use a script to extract one rRNA from each file, this rRNA will have a score that is above 1700, and be between 1400 and 1700 nts long. It will also give each rRNA a unique name.

    This program will also tell you if the rRNAs found for that genome doesn't meet the requirements of having a high enough score or being too long or too short.
    # extract rRNAs from your file
    sh ~karinl/scripts/rnammer/ phylogeny.fasta

Making an alignment and a tree.

You will now use a program called ClustalX to create a multiple alignment.  A multiple alignment is a way of placing sequences against eachother which emphasises the nucleotides or amino acids that the sequences have in common.

Once you have made an alignment, you will make a tree too.

  • Log out and log into a different computer

    This means only log out of ibiology - not login (interaction). You go from interaction to a new machine, called cell. This means that you need to make sure that X works after you have logged in on the new computer

    # log in to the computers again 
    ssh -Y cell
    umask 022
    setenv MAKEFILES /home/people/pfh/bin/Makefile

  • Open the ClustalX program with the phylogeny.fasta file

    # go into the rRNA directory and start clustalx
    cd rRNA
    clustalx phylogeny.fasta
  • Make alignment

    Click Alignment > Do Complete Alignment. You will be asked to output a guide tree and an alignment file, click Align.

    You can track the process at the bottom of the screen. When it is done, it will say "Clustal-Alignment file created".

  • Make a tree.

    Click Trees > Bootstrap N-J Trees.  Click OK. When it is done, it will say 'Bootstrap tree phylogeny.phb created' at the bottom

  • Quit the program

    Go to File > Quit

View tree

You will now look at the tree.

  • Look at the tree
    # display the tree
    njplot phylogeny.phb
  • Display bootstrap values

    Click on the box marked 'Bootstrap values'.

    Question: does the tree that you get look reasonable?

    The tree you have created is a socalled 'Bootstrapped' tree. In boostrapping, the tree is tested over and over again to see if the the branching of the tree would change if the data changes slightly. In this case, you have tested the tree 1000 times. If a branch can move more than 50% of the time, that branch structure, i.e. that way of positioning the organisms in the tree is not very reliable. In your plot the bootstrap values appear as numbers between 1 and 1000 at the branch points.

    Question: is the branching structure of your tree reliable?

  • Save the tree

    Click File > Save plot > OK. You now have a postscript file with your tree in that you can use for your poster.

  • Quit the program

Other useful programs

Another alternative for making publication quality trees is to use the program MEGA4. This program can be used from the alignment part and onwards, and incorporates several tree making methods, and also makes bootstrap and consensus trees. However, this is a Windows program, so those of you who have macs would need to use a program like Crossover to make this program work. This program is something you would have to pay for, but they have a 30 day free trial period.