Molecular Evolution - Mini Project #2
Repeating what I wrote for the last mini project:
An important aim of the mini projects is to place the competences you learn on this course
in the context of a full bioinformatics/systems biology work flow, and at the same time to get you
to think about how to solve practical bioinformatics type problems. The idea is that you should
use everything you've learned so far (also on other courses!) so you will only get minimal
instructions. If you get stuck, then check previous exercises and reading material
(also from other courses!). And if all else fails,
then google is of course your friend (so are the other students on the course. And the students
on other courses!).
Please describe the results of your analysis (e.g., data sets, alignments, plots of trees). Also
give a description of the steps taken along the way. Put everything in a small report that you
should hand in via the CampusNet group.
Comparison of tree building methods, codon-based alignment
Select a protein-encoding DNA sequence that you find interesting.
Use TBLASTX to find about 30 sequences that are related to your starting
sequence (homologs). Make sure to include an outgroup in the data set and present evidence
supporting the outgroup status of the sequence(s). Save as Fasta file.
Align the sequences using
RevTrans. Prepare the alignment
for use in PAUP.
Construct a rooted phylogenetic tree using each of the following criteria:
- Distance based
- Maximum Likelihood
For each criterion: decide on the details of how to reconstruct a tree within that
paradigm (for instance: exhaustive vs. heuristic, observed distances vs K2P-corrected distances,
JC-model vs. HKY model, etc., etc.). Briefly explain why you chose that particular approach
(time constraints? exactness?)
Check whether the reconstructed trees differ, and quantify how different they are, using
the so-called symmetric tree-distance measure.
Construct a consensus tree summarising what is in common among the three trees.
Conclude very briefly what the trees show phylogenetically (this will depend on your
data set). Also explain why it is advantageous to use TBLASTX (instead of BLAST) and
RevTrans (instead of, e.g., ClustalX) on this data set.
Hand in at CampusNet.