Ph.D. course in Bioinformatics
David Ussery
Friday, 4 May, 2001

Animated DNA



Visualisation of DNA Structures in Complete Genomes







Overview

This lecture is about ways of looking at DNA sequences in whole chromosomes. There are two parts to this talk. In Part 1, I will introduce "DNA Atlases", and give several examples of useful information which can be visualised using this approach. I would like to think that one way of dealing with the explosion of sequence information, in terms of DNA sequences, is to think about it in biological terms, in particular in physical-chemical terms of structure and function of symmetry elements. For example, there are specific DNA sequences which "code" for a telomere, and different DNA sequences which are specific for centromeres. Specific DNA sequences, their structures, and biological functions will be discussed.

In Part 2, I will discuss the use "DNA Atlases" to display information about gene expression (and predicted expression) throughout chromosomes.






dna53.gif



Part 2Part 1: DNA Atlases: Visualisation of DNA Structures in Whole Chromosomes



One way of dealing with the problem of how to display so much sequence information is to have a look at the whole chromosome at once, smoothing over a large window. The entire bacterial chromosome is displayed as a circle, with different colours representing various parameters. First, as an introduction to atlases, we will look at base-composition. Then we will have a look at levels of expression of mRNA and proteins throghout the chromosome. As examples, I will use my very favourite organism, Escherichia coli K-12.




Base-composition Atlas




There are several things to notice in this plot. First, the concentration of the bases are not uniformly distributed throughout the genome, but there are "clumps" or clusters where specific bases are a bit more concentrated. Also, the G's (turquoise) clearly are seen to be favoured on one half of the chromosome, whilst the C's (magenta) are on the other strand. This shows up in the "GC-skew" lane as well (2nd circle from the middle). I have labelled the entire terminus region, which ranges from TerE (around 1.08 million bp (Mbp) to TerG (~2.38 Mbp) in Escherichia coli K-12. Finally, several genes corresponding to the darker bands (e.g., more biased nucleotide composition) are labelled. The same pattern can be seen for the other three Escherichia coli chromosomes which have been sequenced (so far!), as shown in the table towards the bottom of this web page.




DNA Structure Atlas




This is a "Structure Atlas" of chromosome 2 from the Malaria parasite Plasmodium falciparum. Note that the telomeric regions light up with several unique structures.




DNA Repeats Atlas




This is a "Repeat Atlas" of a small 21 kbp plasmid from Borrelia burgdorferi, which infects ticks, and is the cause of lyme disease. Note that there is a large direct repeat region in the middle which does not code for genes.




Genome Atlas




This is the "Genome Atlas" of the nucleomorph Guillardia theta which is the smallest eukaryotic genome sequenced. (The total amount of nuclear DNA is about 551,000 bp, on three chromosomes. See Sue Douglas' article - "The highly reduced genome of an enslaved algal nucleus", Nature, 410, 1091-1096, 2001 (26 April issue).







Part 2Part 2: Gene Expression in E. coli



Escherichia coli is probably the best characterised organism.

some numbers:

  • There are 4085 predicted genes in Escherichia coli strain K-12 isolate W3110.


  • There are 4289 predicted genes in Escherichia coli strain K-12 isolate MG1665.


  • There are 5283 predicted genes in Escherichia coli strain O157:H7 isolate EDL933 (enterohemorrhagic pathogen).


  • There are 5361 predicted genes in Escherichia coli strain O157:H7 substrain RIMD 0509952 (enterohemorrhagic pathogen).



  • Roughly 2600 genes have been found to be expressed in Escherichia coli strain K-12 cells, under standard laboratory growth conditions.

  • Transcription animation


  • About 2100 spots can be seen on 2-D protein gels.



  • Very roughly 1000 different genes (only about 600 mRNA transcripts) are expressed at "detectable levels" in E. coli cells grown in LB media.



  • Only about 350 proteins exist at concentrations of > 100 copies per cell. (These make up 90% of the total protein in E.coli!)

  • Most (>90%) of the proteins are present in very low amounts (less than 100 copies per cell).



    What is the chromosomal location of the genes for the highly exressed proteins?


    It has been known since the 1960's that genes closer to the replication origin are more highly expressed. However, it has only been in the past few years that technology has allowed the simultaneous monitoring of ALL the genes in Escherichia coli. There are 4397 annotated genes in the E. coli K-12 genome. Shown below is an "Atlas plot" of the E. coli K-12 genome, with the outer circle representing the concentration of proteins (roughly in number of molecules/cell) and mRNA (again, roughly number of molecules/cell). Under these conditions (e.g., cells grown to late log phase, in minimal media), there were 2005 genes expressed at detectable levels, and only 233 proteins have been found to exist in "abundant" conditions (e.g., very roughly more than 100 molecules per cell).


    For E. coli K-12 cells, grown in minimal media to late log phase:

    4397 annotated genes -> 2005 mRNAs expressed -> 233 abundant proteins



    (note that these numbers will vary for different experimental conditions....)

    E. coli chromatin atlas


    In this picture, the outer lane represents the concentration of proteins (blue), the next lane the concentration of mRNA (green), and then the annotated genes.


    The inner three circles represent different aspects of the DNA base composition throughout the genome. The innermost circle (turquoise/violet) is the bias of G's towards one strand or the other (that is, a look at the mono-nucleotide distribution of the 4 DNA bases). The next lane is the density of stretches of purine (or pyrimidine) stretches of 10 bp or longer. Note that in both cases purines tend to favour the leading strand of the replicore, whilst pyrimidine tracts are more likely to occur on the lagging strand. Finally, the next circle (turquoise/red) is simply the AT content of the genome, averaged over a 50,000 bp window. Note that the terminus is slightly more AT rich, whilst the rest of the genome is slightly GC rich. (The AT content scale ranges from 45% to 55%).




    There are "clumps" of highly expressed genes, and these are anti-correlated with regions of condensed chromatin.

    E. coli chromatin atlas



    Organism %AT Size (bp)
    Atlas
    Number
    of genes
    Coding
    density
    Reference
    Escherichia coli
    Strain: K-12, isolate W3110
    DDBJ     NCBI tax
    49  4,636,552  Genome Atlas 4085  79% 
    1135 bp/gene
    -
    Escherichia coli
    Strain: K-12, isolate MG1655
    U. Wisconsin     TIGR cmr     NCBI tax     NCBI entrez
    49  4,639,221  Genome Atlas 4397  87% 
    1055 bp/gene
    Science 277:1453-1474
    September, 1997
    [PubMed]
    Escherichia coli
    Strain: O157:H7 (substrain EDL93)
    U. Wisconsin     NCBI tax     NCBI entrez
    49  5,529,376  Genome Atlas 5283  86% 
    1047 bp/gene
    Nature 409:529-533
    January, 2001
    [PubMed]
    Escherichia coli
    Strain: O157:H7 (substrain RIMD 0509952)
    Miyazaki, Japan     NCBI tax     NCBI entrez
    49  5,498,450  Genome Atlas 5361  88% 
    1026 bp/gene
    DNA Res. 8:11-22
    February, 2001
    [PubMed]

    Link to more atlases for Escherichia coli genomes.


    Link to the main "Genome Atlas" web page




    REFERENCES

    Papers relevant to this lecture (included in your binder of references)

    1. David W. Ussery, "Genome Databases", The Encyclopedia of Genetics, in press, April, 2001.
    2. Anders Gorm Pedersen, Lars Juhl Jensen, Hans-Henrik Stærfeldt, Søren Brunak, and David W. Ussery, "A DNA Structural Atlas for Escherichia coli", Journal of Molecular Biology, 299 (#4), 907-930, (2000).     [cover]

    3. Link to JMB online version of this article.        PDF file     [PubMed]

    4. Carsten Friis, Lars Juhl Jensen, and David W. Ussery, "Visualisation of Pathogenicity Regions in Bacteria", Genetica, 108:47-51, 2000.
    5. Ussery,D.W., Larsen,T.S., Wilkes,K.T., Friis,C., Worning,P., Krogh,A., Brunak,S. "Genome Organisation and Chromatin Structure in Escherichia coli", Biochimie,83:201-212, (2001).



    Other references

  • Richard R. Sinden, Christopher E. Pearson, Vladimir N. Potoman, and David W. Ussery, "DNA: Structure and Function", Advances in Genome Biology, 5A:1-141, (1998).
  • Ussery,D.W., Higgins,C.F., and Bloshoy,A., "Environmental Influences on DNA Curvature", J. Biomolecular Structure & Dynamics,16:811-823, (1999).[PubMed]

  • Lars Juhl Jensen, Carsten Friis, and David W. Ussery, "Three Views of Microbial Genomes", Research in Microbiology, 150, pages 773-777, 1999.
  •    [cover]     [PubMed]        PDF file

  • David W. Ussery, "Bioinformatics2000 Meeting Report", Genome Biology, 1, (#3), 1-2, 2000.
  • David W. Ussery, "DNA Denaturation", The Encyclopedia of Genetics, in press, May, 2001.
  • David W. Ussery, "DNA Structure: A-, B-, and Z-DNA Families", The Encyclopedia of Life Sciences, in press, May, 2001.



  • Link to a list of recent papers and talks on DNA structures.



    Books about DNA:

    Watson, James D. "A PASSION FOR DNA: Genes, Genomes, and Society", (Oxford University Press, Oxford, 2000).      Amazon      Barnes&Noble

    Sinden, Richard R., "DNA: STRUCTURE and FUNCTION", (Academic Press, New York, 1994).      Amazon      Barnes&Noble

    Calladine,C.R., Drew,H.R., "Understanding DNA: The Molecule and How It Works", (2nd edition, Academic Press, San Diego, 1997).      Amazon      Barnes&Noble



    A List of more than a thousand books about DNA






    Go to the CBS Home Page Back to the CBS homepage

    Back to Dave's Courses page

    Last modified Thursday, 9 November, 2000 by David Ussery