An Introduction to Genome Atlases.

``Genome Atlases'', maps properties of the DNA sequence along the chromosome, and is an extension of the ``DNA Structural Atlas''. The Genome Atlas includes a combination of structural parameters plus information about global repeats and base-composition (Jensen, Friis, and Ussery, 1999). The "DNA Structural Atlas" is described in Pedersen et al., 2000, and there is a web page with background information on the Structural Atlases.



Genome Atlas of E. coli K-12




MORE INFORMATION ABOUT DNA GENOME ATLASES


Certain DNA sequences are known to be much more readily compacted than others, and in fact there are many different DNA structures that can co-exist in living cells (see, for example, Sinden et al., 1998). In this particular case, we are interested in analyzing complete genomes for certain DNA structural elements which would play a role in chromatin organisation. Some DNA helix architectural parameters can be calculated, in terms of dinucleotide or trinucleotide models. In the GenomeAtlas, we have chosen to use three different structural parameters, reflecting DNA curvature, base-stacking ("meltability"), and helix rigidity or flexibility. In the DNA structural atlas, the value of these measures along the chromosome is represented in the form of colour-coded wheels. In all the colour scales, average values are represented as a light grey, while larger and smaller values are shown in progressively higher intensity colours the farther they are from average. For most scales, the values that are more than plus or minus three standard deviations from average are shown as very dark (almost black) lines. [In the case of repeats, it is best to only show the higher end of the scale.]


curvature

 1. Intrinsic Curvature

The outermost circle (orange to blue) shows the relative magnitude of DNA curvature, smoothed over a large window. Note that there are several dark blue regions, indicating areas which are much more curved than the average (grey), and that there are relatively few regions which are LESS curved than average (e.g., dark orange). Also note that in general the region around the replication terminus is more curved (bluish) than the region around the origin (greyish).

Background

DNA curvature is calculated using the CURVATURE programme (Bolshoy et al., 1991, Shpigelman et al., 1993). The term curved DNA here refers to DNA that is intrinsically curved in solution and can be readily characterised by anomalous migration in acrylamide gels. There are different models for curved DNA (reviewed in Sinden et al., 1998), although the predictions for curvature fragments larger than a few hundred bp is essentially the same (Haran and Crothers, 1995). The scale is in arbitrary "Curvature units", which ranges from 0 (e.g. no curvature) to 1.0, which is the curvature of DNA when wrapped around the nucleosome. The average DNA curvature value in the entire E. coli K-12 genome is 0.195. Using a 10,000 bp smoothing window, curvature values are distributed so that the progressively coloured region (-3 std to +3 std) lies between 0.16 ( low = less curved regions), and 0.23 ( high == strongly curved regions).



stacking energy

 2. Stacking Energy

The second circle is the stacking energy, in kcal/mol. Less stacking energy (e.g., a smaller number) means that the helix will melt more readily, and is shaded red. This is correlated with (but not quite the same as) the AT content. Note that there are several regions which are dark red, indicating they will melt quite readily. For example, see the region around the rfaJ gene, near the replication origin. Again, note that there are more red regions (e.g., regions which would melt more easily) than there are green regions (which are more difficult to melt). The distribution for stacking energy is also skewed. This is described in more detail in our "DNA Structural Atlas" paper (see the paper by Pedersen et al.in the references below).

Background

Base-stacking energies are from the dinucleotide values provided by Ornstein et al.. The scale is in kcal/mol, and the dinucleotide values range from -3.82 kcal/mol (will melt easily) up to a maximum value of -14.59 kcal/mol (which would require the most energy to destack or melt the helix). (Link to a web page which lists all 10 values.) A positive peak in base-stacking (i.e., numbers closer to 0) reflects regions of the helix which would de-stack or melt more readily (high). Conversely, minima (larger negative numbers) in this plot would represent more stable regions (low) of the chromosome. The average base-stacking value in the entire E. coli K-12 genome is -8.19 kcal/mol. Using a 10,000 bp smoothing window, curvature values are distributed so that the progressively coloured region (-3 std to +3 std) lies between -8.66 kcal/mol (low = more stable), and -7.71 kcal/mol (high = will melt easier).



position preference

 3. Position Preference

The third circle is ``position preference'', which is related to the flexibility of the DNA. The scale is such that green regions are MORE FLEXIBLE, whilst violet regions are more rigid. There seems to be a fairly good correlation between flexible (``green'') regions and clusters of highly expressed genes.

Background

``Position Preference'' is a trinucleotide model based on the preferential location of sequences within nucleosomal core sequences, as described by Satchwell and Travers (1986). We use the magnitude (e.g.absolute values) of the trinucleotide numbers as a measure of DNA flexibility (Baldi et al., 1996). The trinucleotide values range from essentially zero (0.003, presumably more flexible), to 0.28 (considered rigid). Since very few of the trinucleotide have values close to zero (e.g. little preference for nucleosome positioning), this measure is considered most sensitive towards the low ("flexibity") end of the scale. The average position-preference value in the entire E. coli K-12 genome is 0.15. Using a 10,000 bp smoothing window, curvature values are distributed so that the progressively coloured region (-3 std to +3 std) lies between 0.14 (low = "flexible") and 0.16 (high = more rigid).



Annotations

 4. Annotations.


Usually, the GenBank files for sequenced genomes will contain locations of predicted genes. The genes going in the ``forward'' (clockwise) direction are blue, whilst genes on the other strand are red. Also, rRNA and tRNA genes are shown, although for most bacterial chromosomes, the tRNA genes are too small. Notice that in the E. coli K-12 genome, the seven rRNA genes light up with distinctive structural features - they are generally more GC-rich, more flexible (and of course more highly expressed) and are repeated throughout the chromosome.



Direct Repeats

 5. Global Direct Repeats.

Usually, the GenBank files for sequenced genomes will contain locations of predicted genes. The genes going in the ``forward'' (clockwise) direction are blue, whilst genes on the other strand are red. Also, rRNA and tRNA genes are shown, although for most bacterial chromosomes, the tRNA genes are too small (compared to the size of the chromosome) to easily visualise. Notice that in the E. coli K-12 genome, the seven rRNA genes light up with distinctive structural features - they are generally more GC-rich, more flexible (and of course more highly expressed) and are repeated throughout the chromosome.

There are many different ways of calculating repeats. For the Genome Atlas plots, "Global Direct Repeats" are obtained from a blast search of the genome against itself. Only matches of 100 bp or longer are counted. The (obvious) perfect match of the genome to itself is taken away, and the rest of the best matches are recorded along the length of the chromosome. The scale is the log of the expectation score or ``E value''. Thus, a value of 9 or greater is very significant (e.g., more than 1 in 1,000,000,000).

Inverted Repeats

 6. Global Inverted Repeats.

The inverted repeats is calculated in the same way, but only the hits on the opposite strand are taken into consideration. Note that there are far fewer significant hits on the other strand. This is true for many bacterial genomes, although in eukaryotic chromosomes, the global inverted repeats is often equal to the global direct repeats.



GC skew

 7. GC skew.

The ``GC-skew'' is simply the number of G's minus the number of C's, over a 10,000 bp window. Thus, if there are more G's than C's, a positive number results (turquoise), whilst more C's than G's, will result in a negative number (violet). This can be useful in visualising the replication origin and terminus.



AT content

 8. AT content.

The inner-most circle is the percent AT. In the case of E. coli, which is nearly 50% AT content, on average, the scale goes from 45% (turquoise) to 55% (red). The default settings for a ``Genome Atlas'' plot is from 20% to 80%, but the range has been manually set lower for E. coli to enhance the smaller differences in this genome, which is close to 50% on average. In some atlases (such as the DNA structural atlases), the AT content is plotted plus or minus three standard deviations from the average, which will allow visualisation of regions which differ from the average AT content.





Background References

  1. Baldi,P., Brunak,S., Chauvin,Y., and Krogh,A., "Naturally occurring nucleosome positioning signals in human exons and introns", J. Mol. Biol., 263:503-510, (1996).) [PubMed link]

  2. Bolshoy,A., McNamara,P., Harrington,R.E., and Trifonov,E.N., "Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles", Proc. Natl. Acad. Sci. U.S.A., 88:2312-2316, (1991).     [PubMed link]

  3. Brukner,I., Sanchez,R., Suck,D., and Pongor,S., "Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides", EMBO J., 18:18-12-1818, (1995).     [PubMed link]

  4. Brukner,I., Sanchez,R., Suck,D., and Pongor,S., "Trinucleotide models for DNA bending propensity: comparison of models based on DNaseI digestion and nucleosome packaging data", J. Biomol. Struct. Dyn., 13:309-317, (1995).     [PubMed link]

  5. el Hassan,M.A., and Calladine,C.R., "Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA", J. Mol. Biol., 259:95-103, (1996).     [PubMed link]

  6. Gorin,A.A., Zhurkin,V.B., Olson,W.K., "B-DNA twisting correlates with base-pair morphology", J. Mol. Biol., 247:34-48, (1995).     [PubMed link]


  7. Haran,T.E., Kahn,J.D., Crothers,D.M., "Sequence elements responsible for DNA curvature", J. Mol. Biol., 244:135-143, (1994).     [PubMed link]


  8. Lars Juhl Jensen, Carsten Friis, and David W. Ussery, "Three Views of Microbial Genomes", Research in Microbiology, 150, pages 773-777, 1999.
  9.    
    [cover]     [PubMed]        PDF file

  10. Olson,W.K., Gorin,A.A., Lu,X.J., Hock,L.M., and Zhurkin,V.B., , "DNA sequence-dependent deformability deduced from protein-DNA crystal complexes", Proc. Natl. Acad. Sci. U.S.A, 95:11,163-11168, (1998).     [PubMed link]


  11. Ornstein,R.L., Rein,R., Breen,D.L., Macelroy,R.D., "An optimized potential function for the calculation of nucleic acid interaction energies. I. Base Stacking"Biopolymers, 17:2341-2360, (1978).



  12. Anders Gorm Pedersen, Lars Juhl Jensen, Hans-Henrik Stærfeldt, Søren Brunak, and David W. Ussery, "A DNA Structural Atlas for Escherichia coli", Journal of Molecular Biology, 299 (#4), 907-930, (2000).     [cover]

  13. Link to JMB online version of this article.        PDF file     [PubMed]

  14. Satchwell,S.C., Drew,H.R., and Travers,A.A., "Sequence periodicities in chicken nucleosome core DNA", J. Mol. Biol., 191:659-675, (1986).[PubMed link]

  15. Shpigelman,E.S., Trifonov,E.N., and Bolshoy,A., "CURVATURE: software for the analysis of curved DNA", Comput. Appl. Biosci., 9:434-440, (1993).     [PubMed link]

  16. Sinden,R.R., Pearson,C.E., Potaman,V.N., Ussery,D.W., "DNA Structure and Function", Advances in Genome Biology, 5A: 1-141, (1998).




Go to the CBS Home Page Back to the CBS homepage

Back to Dave's Courses page

Last modified Tuesday, 4 December, 2001 by David Ussery