Molecular Evolution - Mini Project #2

Repeating what I wrote for the last mini project: An important aim of the mini projects is to place the competences you learn on this course in the context of a full bioinformatics/systems biology work flow, and at the same time to get you to think about how to solve practical bioinformatics type problems. The idea is that you should use everything you've learned so far (also on other courses!) so you will only get minimal instructions. If you get stuck, then check previous exercises and reading material (also from other courses!). And if all else fails, then google is of course your friend (so are the other students on the course. And the students on other courses!).

Please describe the results of your analysis (e.g., data sets, alignments, plots of trees). Also give a description of the steps taken along the way. Put everything in a small report that you should hand in via the CampusNet group.

Comparison of tree building methods, codon-based alignment

  1. Select a protein-encoding DNA sequence that you find interesting.

  2. Use TBLASTX to find about 30 sequences that are related to your starting sequence (homologs). Make sure to include an outgroup in the data set and present evidence supporting the outgroup status of the sequence(s). Save as Fasta file.

  3. Align the sequences using RevTrans. Prepare the alignment for use in PAUP.

  4. Construct a rooted phylogenetic tree using each of the following criteria:

    • Parsimony
    • Distance based
    • Maximum Likelihood

    For each criterion: decide on the details of how to reconstruct a tree within that paradigm (for instance: exhaustive vs. heuristic, observed distances vs K2P-corrected distances, JC-model vs. HKY model, etc., etc.). Briefly explain why you chose that particular approach (time constraints? exactness?)

  5. Check whether the reconstructed trees differ, and quantify how different they are, using the so-called symmetric tree-distance measure.

  6. Construct a consensus tree summarising what is in common among the three trees.

  7. Conclude very briefly what the trees show phylogenetically (this will depend on your data set). Also explain why it is advantageous to use TBLASTX (instead of BLAST) and RevTrans (instead of, e.g., ClustalX) on this data set.

  8. Hand in at CampusNet.