Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other
CBS >> CBS Courses >> Scientific Communication of Comparative Genomics >> Course Programme >> Day 6

Exercise 6: Phylogeny

Making a phylogenetic tree

Phylogenetic trees are often constructed from 16S genes. These are genes that code for a 16S, an rRNA that is found within the small subunit. This gene is present in all organisms, and has both conserved and variable regions which means that it can be used to establish the relationship between organisms.

In this exercise you will first predict 16S rRNA genes, make a multiple alignment from them, and then make a bootstrapped tree of them. This tree will let you see how your organisms are related.

Predicting 16S rRNA genes

In some cases the annotations of rRNAs can be inaccurate. Ribosomal RNAs can be predicted using a program called RNAmmer. This program can be either used on the web or through a web servics script that uses this program via the internet.
  1. Log in

    log in using SSH or Putty to the computer

    # log in to the computers again as, then:
    ssh -Y ibiology
    umask 022
    setenv MAKEFILES /home/people/pfh/bin/Makefile
  2. Create a directory to store things in
    # create directory for holding the rRNA predictions, and go into that directory
    mkdir rRNA
    cd ~/rRNA
  3. Predict rRNAs

    The RNAmmer program is a web service, like the atlas program and the prodigal program you used before. This means that you will be put in a queue, and you will see that for each of your organisms, there will be a procession of QUEUED, followed by ACTIVE, and then FINISHED.
    # get just the fasta sequence for these genomes and predict the rRNAs
    foreach i (../data/fsafiles/[A-Z]*.fasta)
    perl ~karinl/scripts/rnammer/ bac ssu < $i > $i:r.rrna.fsa
    cp $i:r.rrna.fsa .
  4. Have a look at one of the files.
    # look at one of the files
    less <filename>.fasta.rrna.fsa 

    You should have a set of sequences in FASTA format in your file, each approximately 1500 nt long.

  5. Extract one rRNA from each and put it into another file.

    You will need one 16S sequence for each organism. Many organisms contain many of them, some have upto 16 16S rRNAs. To get one from each of your organisms, you will use a script to extract one rRNA from each file, this rRNA will have a score that is above 1700, and be between 1400 and 1700 nts long. It will also give each rRNA a unique name.

    This program will also tell you if the rRNAs found for that genome doesn't meet the requirements of having a high enough score or being too long or too short.
    # extract rRNAs from your file - fix! gives stderr to worng place
    sh ~karinl/scripts/rnammer/ phylogeny.fasta

Making an alignment and a tree.

You will now use a program called ClustalX to create a multiple alignment.  A multiple alignment is a way of placing sequences against eachother which emphasises the nucleotides or amino acids that the sequences have in common.

Once you have made an alignment, you will make a tree too.

  1. Log out and log into a different computer

    This means only log out of ibiology - not login (interaction). You go from interaction to a new machine, called cell. This means that you need to make sure that X works after you have logged in on the new computer

    # log in to the computers again 
    ssh -Y cell
    umask 022
    setenv MAKEFILES /home/people/pfh/bin/Makefile

  2. Open the ClustalX program with the phylogeny.fasta file

    # go into the rRNA directory and start clustalx
    cd rRNA
    clustalx phylogeny.fasta
  3. Make alignment

    Click Alignment > Do Complete Alignment. You will be asked to output a guide tree and an alignment file, click Align.

    You can track the process at the bottom of the screen. When it is done, it will say "Clustal-Alignment file created".

  4. Make a tree.

    Click Trees > Bootstrap N-J Trees.  Click OK. When it is done, it will say 'Bootstrap tree phylogeny.phb created' at the bottom

  5. Quit the program

    Go to File > Quit

View tree

You will now look at the tree.

  1. Look at the tree
    # display the tree
    njplot phylogeny.phb
  2. Display bootstrap values

    Click on the box marked 'Bootstrap values'.

    Question: does the tree that you get look reasonable?

    The tree you have created is a socalled 'Bootstrapped' tree. In boostrapping, the tree is tested over and over again to see if the the branching of the tree would change if the data changes slightly. In this case, you have tested the tree 1000 times. If a branch can move more than 50% of the time, that branch structure, i.e. that way of positioning the organisms in the tree is not very reliable. In your plot the bootstrap values appear as numbers between 1 and 1000 at the branch points.

    Question: is the branching structure of your tree reliable?

  3. Save the tree

    Click File > Save plot > OK. You now have a postscript file with your tree in that you can use for your poster.

  4. Quit the program

Other useful programs

Another alternative for making publication quality trees is to use the program MEGA4. This program can be used from the alignment part and onwards, and incorporates several tree making methods, and also makes bootstrap and consensus trees. However, this is a Windows program, so those of you who have macs would need to use a program like Crossover to make this program work. This program is something you would have to pay for, but they have a 30 day free trial period.