Exercise 2: 16S rRNA trees
IMPORTANT: You must have
completed last time's exercises before you start on today's work.
Some tools used today
There are some tools that we are using today that deserves a closer
explanation.
- Perl
and python: perl and python are languages which
can be used to make everything from large computer programs to small
utility programs. In this course we will often use perl and python
scripts, which are small programs, to calculate things and to convert
files.
- make:
gmake is a system which allows us to automate tasks. In many cases
doing something involves several steps, such as converting files,
calculating something and making a graphic file of it all. When using
gmake, you specify what kind of file you want to end up with, and the
coputer figures out what steps need to be done to get to that point.
For instance. you have a genbank file and you only want the fasta
sequence of the genome. The file is named "a.gbk", if you want to use
gmake to get the fasta sequence, you type in "gmake a.fsa"., i.e., you
replace the ending with what you want to end up with.
X client
You will be both looking at files and editing files today. This is why
you need to get a a program which can transfer images from the CBS
computers to your computer, and which will let you interact with it. A how-to guide can be found here:
How to get X up and working
Programs
Once you have got X working you can start to look at files, and to edit
files. There are two programs that you will use to do this with.
- nedit:
this is a program that will let you edit a text file and save it. You
can use the mouse to click inside of the window and to interact with
the menues.
- ghostview:
this is a program which will let you look at a postscript file.
Postscript is actually a language for describing graphics which was
developed for getting pictures out on a printer. Postscript files are
acutally textfiles which you can look at and - strictly speaking - edit
(not recommended unless you really know what you are doing). Ghostview
takes the postscript and interprets how it would look on a page and
displays it on your screen.
Making a phylogenetic tree
Phylogenetic trees are often constructed from 16S genes. These are
genes that code for a 16S, an rRNA that is found within the small
subunit. This gene is present in all organisms, and has both conserved
and variable regions which means that it can be used to establish the
relationship between organisms.
In this exercise you will first predict 16S rRNA genes, make a multiple
alignment from them, and then make a bootstrapped tree of them. This
tree will let you see how your organisms are related.
Predicting 16S rRNA genes
In some
cases the annotations of rRNAs can be inaccurate. Ribosomal RNAs can be
predicted using a program called RNAmmer. This program can be either
used on the web or through a
web servics script that uses this program via the internet.
-
Log in to the CBS computers
Find a program which will let you log into the
CBS computers.
Computer name: login.cbs.dtu.dk
User name: studXXX
You will get your password from the teachers.
After that you need to log into the computer
where we will do the exercises, which is named ibiology.
# log into CBS ssh -Y ibiology setenv MAKEFILES /home/people/pfh/bin/Makefile umask 022
- Create
a directory to store things in
# create directory for holding the rRNA predictions, and go into that directory mkdir rRNA cd ~/rRNA
- Predict
rRNAs
The RNAmmer program is a web service, like the atlas program and the
prodigal program you used before.
This means that you will be put in a queue, and you will see that for
each of your organisms, there will be a procession of QUEUED, followed
by
ACTIVE, and then FINISHED.
# get just the fasta sequence for these genomes and predict the rRNAs foreach i (../data/fsafiles/[A-Z]*.fasta) perl ~karinl/scripts/rnammer/rnammer.pl bac ssu < $i > $i:r.rrna.fsa cp $i:r.rrna.fsa . end
- Have
a look at one of the files.
# look at one of the files less <filename>.rrna.fsa
You should have a set of sequences in FASTA format in your file, each
approximately 1500 nt long.
- Extract
one rRNA from each and put it into another file.
You will need one 16S sequence for each
organism. Many organisms contain many of them, some have upto 16 16S
rRNAs. To get one from each of your organisms, you will use a script to
extract one rRNA from each file, this rRNA will have a score that is
above 1700, and be between 1400 and 1700 nts long. It will also give
each rRNA a unique name.
This program will also tell you if the rRNAs found for that genome
doesn't meet the requirements of having a high enough score
or being too long or too short.
# extract rRNAs from your file sh ~karinl/scripts/rnammer/extractSeqs.sh phylogeny.fasta
Making an alignment and a tree.
You will now use a program called ClustalX to create a multiple
alignment. A multiple alignment is a way of placing sequences
against eachother which emphasises the nucleotides or amino acids that
the sequences have in common.
Once you have made an alignment, you will make a tree too.
View tree
You will now look at the tree.
Other useful programs
Another alternative for making publication quality trees is to use the
program MEGA4.
This program can be used from the alignment part and onwards, and
incorporates several tree making methods, and also makes bootstrap and
consensus trees. However, this is a Windows program, so those of you
who have macs would need to use a program like Crossover to
make this program work. This program is something you would have to pay
for, but they have a 30 day free trial period.
|
|