Consensus Trees

Exercise written by: Anders Gorm Pedersen

Consensus Trees

  1. Getting started:

    Using what you've learned previously do the following:

    • Construct directory named condist
    • Make condist the current directory (go to the dir)
    • Copy the alignment file from your parsimony1 directory to the condist directory
    • Open the file in nedit and have a look

    This file contains an alignment (in nexus format) of 41 aligned Hepatitis C virus (HCV) sequences isolated from 5 different patients. Sequences are named in the following way:


    For instance, the sequence labeled 1_1_5 was isolated from patient number 1 at time point 1 and is clone number 5 from that patient and that time point.

  2. Start the paup program and load the data file:


    This command opens the PAUP* program and automatically executes the nexus file at the same time.

  3. Define the outgroup:

    Using what you've learned, define an outgroup including the following sequences, activate outgroup rooting, and ensure that the outgroup is printed as a monophyletic sister group to the ingroup: 2_1_1 2_1_2 2_1_3 2_1_4 2_1_5 2_1_7 2_1_8 2_1_9 2_1_10

    This puts all nine sequences from patient 2 in the outgroup. This will help make the tree-plots clearer.

  4. Enable PAUP* to store an unlimited number of trees:

    set increase=auto

    Normally PAUP* will only store up to "maxtrees" trees in memory. This command allows maxtrees to be increased automatically (without prompting for user confirmation) if the need arises during the heuristic search.

  5. Perform a heuristic search using TBR:

    Again using what you've learned, start a heuristic search of the TBR type. Set options such that the initial tree is constructed using sequential addition where sequences are added in random order, and 20 different starting trees are tried.

    After a brief processing time you will be back where you ended last Wednesday. Among a total of approximately 10^60 possible trees, PAUP* has found about 200 equally parsimonious best trees. Two hundred may sound like a depressingly large number of alternative reconstructions but as you will now see, these trees do in fact have a lot in common.

    Q1: what is the length of these best trees?

  6. Convert trees to rooted form:


    Above we have specified an outgroup and requested that trees be plotted with a root determined by this outgroup. However, the trees that we found by heuristic searching are still unrooted, and we need to explicitly specify that we want them to be rooted. Placement of the root is of course done on the basis of the outgroup.

  7. Inspect resulting trees individually:

    describetrees 37/plot=cladogram label=no

    This shows you one randomly picked tree (tree number 37) among the >200 best trees that were found by the heuristic search. (The option label=no turns off labeling of the internal nodes in the tree). Notice how the viral sequences from each individual patient group together. This shows that while there is considerable diversity in the viral population within any single patient, those viruses are nevertheless more closely related to each other than to viruses from other patients. This is of course a result of the viruses in one patient all having descended from the virus that originally infected that patient. Plotting the tree with branch lengths may make this clustering more apparent:

    describetrees 37/plot=phylogram label=no

    Remember: you also have the option of saving one or more trees to file and then viewing the tree using FigTree on your own computer. For instance, you save tree number 37 by the following command:

    savetrees brlens=yes from=37 to=37

    You can also save a range of trees of course.

    To see whether this phenomenon is limited to the tree we selected first, save a range of 10 trees to a file and then inspect them in figtree. Notice that when more than one tree is opened in FigTree you can use the small arrows labeled "Prev/Next" to move between trees:

  8. Construct a consensus tree :

    You should now be convinced that the more than 200 equally good trees do in fact have quite a lot in common. Importantly it seems that all trees have viruses from individual patients grouped separately (forming five monophyletic groups). In order to investigate this question we will now construct a majority rule consensus tree summarizing the branching patterns in all the >200 trees:

    contree all /strict=no majrule=yes percent=50

    This constructs a consensus tree showing monophyletic groups occurring in more than 50% of all trees. Scroll back to see the tree.At each internal node it is indicated how often the corresponding group (meaning all taxa descending from that internal node) was found in the set of all trees. (Numbers are percentages). The option percent=50 specifies that we want to see only groups occurring at least 50% of the time (i.e., we are requesting a "majority rule consensus"). You can increase this value (not lower it) if you want to set a different cutoff.

    Q2: For each of the five patients you should now answer the following two questions:

    1. are all the sequences for this patient grouped in the consensus tree?
    2. if your answer to (1) was "yes": in what fraction of the original trees was this the case (this is the percentage written at the internal node at the basis of that patient's group of sequences)?

    You will note that there are some sub-trees where the branching order is now unresolved, meaning that three or more taxa all split out from the same internal node. These multifurcations show that while more than 50% of the individual trees had those taxa together as a group (the precise number is indicated at the internal node), different trees nevertheless disagreed on the exact branching order within that group.

    As you can see, consensus trees are a handy way of summarizing the evidence shared in a set of trees, and they are therefore useful when a search identifies several good reconstructed phylogenies.