Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Exercise 2


OBJECTIVES

The purpose of this exercise is to become familiar with the construction of genome atlases, with their analysis and use as a visualization tool. We will make three different kinds of atlases for each genome, and see what we can learn from them.

Key tools used in this exercise are:
We will be using a system of makefiles, which is designed for the compilation of programs, but which is used for running pipelines of different tools and format conversions at cbs. For this purpose we use the GNU Make program.
To construct the atlases, the GeneWiz program is used. An online java version is soon to be released.
We will be using a MySQL database to access results.



  1. LOG IN

    Open a session on 'organism.cbs.dtu.dk', and from this login to ibiology and create directories for you to work in:
    ssh -X ibiology
    cd projects
    mkdir Ex2
    cd Ex2
    
  2. Link data from last week

    Remember that we are working on the following genomes:
    +-----------+-----------------------------------------------------------------+
    | accession | organism                                                        |
    +-----------+-----------------------------------------------------------------+
    | AE016879  | Bacillus anthracis str. Ames, complete genome.                  |
    | AE017042  | Yersinia pestis biovar Microtus str. 91001, complete genome.    |
    | AL111168  | Campylobacter jejuni subsp. jejuni NCTC 11168 complete genome.  |
    | AL645882  | Streptomyces coelicolor A3(2) complete genome.                  |
    | AP008232  | Sodalis glossinidius str. 'morsitans' DNA, complete genome.     |
    | AP009048  | Escherichia coli W3110 DNA, complete genome.                    |
    | BA000021  | Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis |
    | CP000034  | Shigella dysenteriae Sd197, complete genome.                    |
    +-----------+-----------------------------------------------------------------+
    
    Today we will use mainly the genbank files and the data from the database. Thus we start out by linking the genbank files from the directory we used last week.
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    mkdir $genome
    ln -sf ~/Ex1/source/$genome.gbk $genome/
    end
    
  3. Making a base atlas

    Using the Makefile system at cbs, a lot of common tasks are made easy by using an ordered pipeline of commands. We will use this a lot in this exercise.
    First we make the configuration files for the base atlases. This is done using GNU Make.
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    cd $genome
    gmake $genome.baseatlas.cf &
    cd ..
    end
    wait
    
    When this command has finished, inspect the files using your favourite editor. One thing to notice is that the config file for Streptomyces coelicolor creates a circular atlas. This is easily remedied by replacing all occurences of circle with linear. This can be done quickly with the command:
    sed 's/circle/linear/g' AL645882/AL645882.baseatlas.cf \
      > AL645882/AL645882.baseatlas.lin.cf
    mv AL645882/AL645882.baseatlas.lin.cf AL645882/AL645882.baseatlas.cf
    
    Now we only need to make the actual atlases.
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    cd $genome
    gmake $genome.baseatlas.ps &
    cd ..
    end
    wait
    
    This creates a postscript file of the base atlas for each genome. These files can be viewed with ghostview.
  4. Making a structure atlas

    This step is almost the same as the former, only we make a DNA structure atlas this time. Again we start by making the configuration files for the atlases.
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    cd $genome
    gmake $genome.structureatlas.cf &
    cd ..
    end
    wait
    
    Inspecting the files you should start to see a pattern in the format. Again the config file for S. coelicolor creates a circular atlas. We fix it with the command:
    sed 's/circle/linear/g' AL645882/AL645882.structureatlas.cf > AL645882/AL645882.structureatlas.lin.cf
    mv AL645882/AL645882.structureatlas.lin.cf AL645882/AL645882.structureatlas.cf
    
    Now we only need to make the actual atlases.
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    cd $genome
    gmake $genome.curvature.gz
    gmake $genome.structureatlas.ps &
    cd ..
    end
    wait
    
    This creates a postscript file of the structure atlas for each genome.
  5. Making a genome atlas

    For the genome atlas, the procedure is the same as the two previous atlases. Thus we make it in one big step
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    cd $genome
    gmake $genome.genomeatlas.cf &
    cd ..
    end
    wait
    sed 's/circle/linear/g' AL645882/AL645882.genomeatlas.cf > AL645882/AL645882.genomeatlas.lin.cf
    mv AL645882/AL645882.genomeatlas.lin.cf AL645882/AL645882.genomeatlas.cf
    foreach genome (AE017042 AE016879 AL111168 AL645882 AP008232 AP009048 BA000021 CP000034)
    cd $genome
    gmake $genome.genomeatlas.ps &
    cd ..
    end
    wait
    
    And once this is done we have a postscript file of the genome atlas for each genome.