Molecular Evolution - Mini Project #2
Comparison of tree building methods and codon-based alignment - The origin of the SARS virus
The previously unknown SARS virus generated widespread panic in 2002 and 2003 when the airborne germ caused 774
deaths and more than 8000 cases of illness. But where did this mystery virus come from? Scientists immediately
suspected that it had jumped to humans from some other organism. In May of 2003, attention focused in on cat-like
mammals called civets. Infected civets were discovered at a live animal market in southern China (where they are
occasionally eaten). Then in the fall of 2005, two teams of researchers independently discovered large reservoirs of
a SARS-like virus in Chinese horseshoe bats. Follow the guidelines below and use everything you know about
phylogenetic reconstruction (and bioinformatics, and molecular biology) to try to solve the puzzle of where
SARS originated. Have fun!
Report:The results of your mini project should be put in a brief report that you hand
in at CampusNet. Make sure that your report includes everyting requested below.
In this project the focus will be on using protein-encoding DNA sequences from SARS viruses
isolated from human, civet, or bat hosts. We suggest you use the DNA sequence for the "spike
glycoprotein", but any gene that is present in all the isolates you find, will be fine. Hint: Some
isolates are whole genome sequences. In these cases you can use the hyperlinked CDS feature keys on a
GenBank page to get the DNA sequence for just your selected gene.
Use the keyword search interface at the NCBI Nucleotide
database to find about 5-10 SARS DNA sequences isolated from each of the following species:
Use database searching to find homologous sequences from one or more coronaviruses other than SARS, and
use as outgroup. Save as Fasta file.
Align the sequences using
RevTrans. Prepare the alignment
for use in PAUP. Explain why RevTrans is a good way of aligning these sequences.
Construct a rooted phylogenetic tree using each of the following criteria:
- Distance based (distances corrected by ML method using HKY + gamma model)
- Maximum Likelihood (model = HKY + gamma)
Note that both the distance based approach and the maximum likelihood approach should be based
on the same model of evolution, namely HKY + gamma distributed rates. In the distance based
reconstruction the HKY + gamma corrected distances
should be estimated using maximum likelihood on the entire data set (not pairwise correction).
Explain what the HKY + gamma model is.
Check whether the reconstructed trees differ, and quantify how different they are, using
the so-called symmetric tree-distance measure. Explain what this measure means.
Construct a consensus tree summarising what is in common among the three trees.
What can you conclude from the phylogenetic tree regarding the evolution of SARS in humans?
Hand in at CampusNet ("Assignments" -> "Mini project #2").