Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Exercise M6


OBJECTIVES

The purpose of this exercise is to explore AT content in a comparative genomics context. We will not worry about statistics, but limit ourselves to exploratory data analysis. You will find interesting patterns and try to explain these phenomena using common sense and biological knowledge. Apart from writing parsers, this is perhaps the most valuable skill when doing bioinformatics.

Key tools used in this exercise:
The only tools needed for this exercise are a web-browser and an open mind.
  1. Comparing genome features

    To begin this exercise, point your browser to
    http://www.cbs.dtu.dk/services/GenomeAtlas/show-kingdom.php?kingdom=Bacteria
    
    A more comprehensive genome list can be found at
    http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
    
    For the purposes of this exercise we will stick to the former, that is the CBS website, since it has some features which we will be using.
    Start by sorting the list by AT content and scroll up and down a bit.
    Take note of any interesting observations and correlations.
  2. Using scatter plots

    Now we will try to take a more systematic approach.
    Select the Compare Within Search button on the atlas webpage. This can be used to make scatter plots of a variety of sequence derived features.

    Genome averages
    First we will look at the AT content on a large scale. Try to make a scatter plot of genome length vs. AT content.
    Do you see a correlation?

    The correlation of genome length and AT content is a topic that is still being studied. It seems to be a rather complex relationship, influenced by growth temperature and several other factors. Do you have an idea for an underlying principle for this relationship?

    To help with the above queston, try to plot DNA melting energy vs. AT content. You should see a very clear correlation.

    Local level
    Try to make a plot with AT content on the x-axis and local repeats on the y-axis.

    What kind of correlation do you see?
    After doing this for several types of local repeats, try to do the same for global repeats.
    What do you see?

    Can you explain what you see? Why is there a difference between the local level and the global level?

    The end
    When you are done with this exercise, you can try to continue where you left off with exercise M3.
    When everyone is done, we will have a summary of the exercise.