Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Molecular Evolution - Mini Project #2

Comparison of tree building methods and codon-based alignment - The origin of the SARS virus

The previously unknown SARS virus generated widespread panic in 2002 and 2003 when the airborne germ caused 774 deaths and more than 8000 cases of illness. But where did this mystery virus come from? Scientists immediately suspected that it had jumped to humans from some other organism. In May of 2003, attention focused in on cat-like mammals called civets. Infected civets were discovered at a live animal market in southern China (where they are occasionally eaten). Then in the fall of 2005, two teams of researchers independently discovered large reservoirs of a SARS-like virus in Chinese horseshoe bats. Follow the guidelines below and use everything you know about phylogenetic reconstruction (and bioinformatics, and molecular biology) to try to solve the puzzle of where SARS originated. Have fun!

Report:The results of your mini project should be put in a brief report that you hand in at CampusNet. Make sure that your report includes everyting requested below.

  1. In this project the focus will be on using protein-encoding DNA sequences from SARS viruses isolated from human, civet, or bat hosts. We suggest you use the DNA sequence for the "spike glycoprotein", but any gene that is present in all the isolates you find, will be fine. Hint: Some isolates are whole genome sequences. In these cases you can use the hyperlinked CDS feature keys on a GenBank page to get the DNA sequence for just your selected gene.

  2. Use the keyword search interface at the NCBI Nucleotide database to find about 5-10 SARS DNA sequences isolated from each of the following species:

    • Civet
    • Human
    • Bat
  3. Use database searching to find homologous sequences from one or more coronaviruses other than SARS, and use as outgroup. Save as Fasta file.

  4. Align the sequences using RevTrans. Prepare the alignment for use in PAUP. Explain why RevTrans is a good way of aligning these sequences.

  5. Construct a rooted phylogenetic tree using each of the following criteria:

    • Parsimony
    • Distance based (distances corrected by ML method using HKY + gamma model)
    • Maximum Likelihood (model = HKY + gamma)

    Note that both the distance based approach and the maximum likelihood approach should be based on the same model of evolution, namely HKY + gamma distributed rates. In the distance based reconstruction the HKY + gamma corrected distances should be estimated using maximum likelihood on the entire data set (not pairwise correction). Explain what the HKY + gamma model is.

  6. Check whether the reconstructed trees differ, and quantify how different they are, using the so-called symmetric tree-distance measure. Explain what this measure means.

  7. Construct a consensus tree summarising what is in common among the three trees.

  8. What can you conclude from the phylogenetic tree regarding the evolution of SARS in humans?

  9. Hand in at CampusNet ("Assignments" -> "Mini project #2").