This lecture is about ways of looking at DNA sequences in whole chromosomes. There are two parts to this talk. In Part 1, I will introduce "DNA Atlases", and give several examples of useful information which can be visualised using this approach. I would like to think that one way of dealing with the explosion of sequence information, in terms of DNA sequences, is to think about it in biological terms, in particular in physical-chemical terms of structure and function of symmetry elements. For example, there are specific DNA sequences which "code" for a telomere, and different DNA sequences which are specific for centromeres. Specific DNA sequences, their structures, and biological functions will be discussed.
In Part 2, I will discuss the use "DNA Atlases" to display information about gene expression (and predicted expression) throughout chromosomes.
One way of dealing with the problem of how to display so much sequence information is to have a look at the whole chromosome at once, smoothing over a large window. The entire bacterial chromosome is displayed as a circle, with different colours representing various parameters. First, as an introduction to atlases, we will look at base-composition. Then we will have a look at levels of expression of mRNA and proteins throghout the chromosome. As examples, I will use my very favourite organism, Escherichia coli K-12.

There are several things to notice in this plot. First, the concentration of the bases are not uniformly distributed throughout the genome, but there are "clumps" or clusters where specific bases are a bit more concentrated. Also, the G's (turquoise) clearly are seen to be favoured on one half of the chromosome, whilst the C's (magenta) are on the other strand. This shows up in the "GC-skew" lane as well (2nd circle from the middle). I have labelled the entire terminus region, which ranges from TerE (around 1.08 million bp (Mbp) to TerG (~2.38 Mbp) in Escherichia coli K-12. Finally, several genes corresponding to the darker bands (e.g., more biased nucleotide composition) are labelled. The same pattern can be seen for the other three Escherichia coli chromosomes which have been sequenced (so far!), as shown in the table towards the bottom of this web page.

This is a "Structure Atlas" of chromosome 2 from the Malaria parasite Plasmodium falciparum. Note that the telomeric regions light up with several unique structures.

This is a "Repeat Atlas" of a small 21 kbp plasmid from Borrelia burgdorferi, which infects ticks, and is the cause of lyme disease. Note that there is a large direct repeat region in the middle which does not code for genes.

This is the "Genome Atlas" of the nucleomorph Guillardia theta which is the smallest eukaryotic genome sequenced. (The total amount of nuclear DNA is about 551,000 bp, on three chromosomes. See Sue Douglas' article - "The highly reduced genome of an enslaved algal nucleus", Nature, 410, 1091-1096, 2001 (26 April issue).
Escherichia coli is probably the best characterised organism.
some numbers:
There are 4085 predicted genes in Escherichia coli strain K-12 isolate W3110.
There are 4289 predicted genes in Escherichia coli strain K-12 isolate MG1665.
There are 5283 predicted genes in Escherichia coli strain O157:H7 isolate EDL933 (enterohemorrhagic pathogen).
There are 5361 predicted genes in Escherichia coli strain O157:H7 substrain RIMD 0509952 (enterohemorrhagic pathogen).
Roughly 2600 genes have been found to be expressed in Escherichia coli strain K-12 cells, under standard laboratory growth conditions.
![]()
About 2100 spots can be seen on 2-D protein gels.
Very roughly 1000 different genes (only about 600 mRNA transcripts) are expressed at "detectable levels" in E. coli cells grown in LB media.
Only about 350 proteins exist at concentrations of > 100 copies per cell. (These make up 90% of the total protein in E.coli!)
Most (>90%) of the proteins are present in very low amounts (less than 100 copies per cell).
It has been known since the 1960's that genes closer to the replication origin are more highly expressed. However, it has only been in the past few years that technology has allowed the simultaneous monitoring of ALL the genes in Escherichia coli. There are 4397 annotated genes in the E. coli K-12 genome. Shown below is an "Atlas plot" of the E. coli K-12 genome, with the outer circle representing the concentration of proteins (roughly in number of molecules/cell) and mRNA (again, roughly number of molecules/cell). Under these conditions (e.g., cells grown to late log phase, in minimal media), there were 2005 genes expressed at detectable levels, and only 233 proteins have been found to exist in "abundant" conditions (e.g., very roughly more than 100 molecules per cell).
For E. coli K-12 cells, grown in minimal media to late log phase:
4397 annotated genes -> 2005 mRNAs expressed -> 233 abundant proteins
(note that these numbers will vary for different experimental conditions....)
In this picture, the outer lane represents the concentration of proteins (blue), the next lane the concentration of mRNA (green), and then the annotated genes.
The inner three circles represent different aspects of the DNA base composition throughout the genome. The innermost circle (turquoise/violet) is the bias of G's towards one strand or the other (that is, a look at the mono-nucleotide distribution of the 4 DNA bases). The next lane is the density of stretches of purine (or pyrimidine) stretches of 10 bp or longer. Note that in both cases purines tend to favour the leading strand of the replicore, whilst pyrimidine tracts are more likely to occur on the lagging strand. Finally, the next circle (turquoise/red) is simply the AT content of the genome, averaged over a 50,000 bp window. Note that the terminus is slightly more AT rich, whilst the rest of the genome is slightly GC rich. (The AT content scale ranges from 45% to 55%).
| Organism | %AT | Size (bp) | Number of genes |
Coding density |
Reference![]() |
|
| Escherichia coli Strain: K-12, isolate W3110 DDBJ NCBI tax |
49 | 4,636,552 | Genome Atlas | 4085 | 79% |
- |
| Escherichia coli Strain: K-12, isolate MG1655 U. Wisconsin TIGR cmr NCBI tax NCBI entrez |
49 | 4,639,221 | Genome Atlas | 4397 | 87% |
Science 277:1453-1474 September, 1997 [PubMed] |
| Escherichia coli Strain: O157:H7 (substrain EDL93) U. Wisconsin NCBI tax NCBI entrez |
49 | 5,529,376 | Genome Atlas | 5283 | 86% |
Nature 409:529-533 January, 2001 [PubMed] |
| Escherichia coli Strain: O157:H7 (substrain RIMD 0509952) Miyazaki, Japan NCBI tax NCBI entrez |
49 | 5,498,450 | Genome Atlas | 5361 | 88% |
DNA Res. 8:11-22 February, 2001 [PubMed] |
Link to more atlases for Escherichia coli genomes.
Link to the main "Genome Atlas" web page
[PubMed]
Link to a list of recent papers and talks on DNA structures.
Watson, James D. "A PASSION FOR DNA: Genes, Genomes, and Society", (Oxford University Press, Oxford, 2000). Amazon Barnes&Noble
Sinden, Richard R., "DNA: STRUCTURE and FUNCTION", (Academic Press, New York, 1994). Amazon Barnes&Noble
Calladine,C.R., Drew,H.R., "Understanding DNA: The Molecule and How It Works", (2nd edition, Academic Press, San Diego, 1997). Amazon Barnes&Noble
A List of more than a thousand books about DNA
Back to the CBS homepageLast modified Thursday, 9 November, 2000 by David Ussery