DNA Symmetry Elements and their Meanings
This lecture is about ways of looking at DNA sequences in complete genomes and chromosomes, in terms of symmetry elements. There are two parts to this talk. In Part 1, I will discuss the fact that we simply have "Too Much Information" becoming available, and the problem will only get worse in the near future. There are ways of cataloging and organising the data, of course. However, many people don't appreciate the true diversity of genome sizes in Nature, so we'll talk for a few minutes about the "C-value paradox", along with some possible ideas for WHY certain organisms have so much DNA.
In Part 2 we get at the main subject of the lecture, which is a look at DNA symmetry elements and their biological meanings. Although you could have essentially an infinite variety of different possible DNA sequences, fortunately, there are only a limited number of DNA conformations. I would like to think that one way of dealing with all this information, in terms of DNA sequences, is to think about it in biological terms, in particular in physical-chemical terms of structure and function of symmetry elements. For example, there are specific DNA sequences which "code" for a telomere, and different DNA sequences which are specific for centromeres. Specific DNA sequences, their structures, and biological functions will be discussed.
I have also made a separate file, containing specific LEARNING OBJECTIVES for this lecture, as well as a "self-test quiz", which I recommend having a look at, BEFORE the lecture, if possible. I've incorporated the answers to questions 1 and 2 into PART 1 of the lecture notes.
Brevis esse laboro, Obscuro fio. - Horace
Some philosophical thoughts about Information and the Size of Genomes.
The information in GenBank is doubling every 10 months.
What are the implications of this?
![]()
A look at genome sequencing, from my lecture notes for the past four years:
1995: The Only Sequenced Genome:
(so far, as of 14 Sept., 1995)
from a "Journal club" presentation at the Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, September, 1995.
1. Haemophilus influenzae

1996: Genomes from various organisms:
(4 organisms have been sequenced as of 1 Nov., 1996)
from a "Workshop on DNA Structure and Function", given at the Norwegian veterinærhøgskole, November, 1996.
| Organism | % Coding | Size (bp) | # genes |
| Mycoplasma genitalium | 580,073 | 468 | |
| Haemophilus influenzae | 2,087,778 | 1,662 | |
| Methanococcus jannashchii | 1,660,000 | 1,997 | |
| Synechocystissp. | 3,570,000 | 3,168 | |
| Escherichia coli | ~3,000,000 | ~3,400 | |
| Saccharomyces cerevisiae | 13,000,000 | ~5,000 | |
| Homo sapiens | ~3,000,000,000 | ~70,000 |
1997: A List of Sequenced Genomes
(9 organisms have been sequenced so far, as of 30 September, 1997)
from a lecture to an Introductory Genetics course at Roanoke College, in Salem, Virginia, October, 1997.
| Organism | Type | Size (Mbp) | number of genes | date sequenced |
| Haemophilus influenzae | Bacteria (Gm-) | |||
| Mycoplasma genitalium | Bacteria (Gm-) | |||
| Synechocystissp. | Bacteria ("blue-green algae") | |||
| Methanococcus jannashchii | Archaebacteria | |||
| Mycoplasma pneumoniae | Bacteria (Gm-) | |||
| Saccharomyces cerevisiae | Eukaryotic ("baker's yeast") |
|||
| Helicobacter pylori | Bacteria (Gm-) | |||
| Escherichia coli | Bacteria (Gm-) | |||
| Bacillus subtilis | Bacteria (Gm+) | |||
| Archaeoglobus fulgidus | Archaebacteria | |||
| Borrelia burgdorferi | Bacteria (Gm-) |
1998: A List of Sequenced Genomes
(17 so far, as of 1 September, 1998)
from last year's lecture, and also an "electronic poster" for the 2nd Annual Conference on Computation Genomics.
| Organism | # | Type | Size (Mbp) | number of genes | |
| Haemophilus influenzae | 1 | Bacteria (Gm-) | |||
| Mycoplasma genitalium | 2 | Bacteria (Gm-) |
|
||
| Saccharomyces cerevisiae | 3 | Eukaryotic
("baker's yeast") |
|||
| Methanococcus jannashchii | 4 | Archaebacteria | |||
| Synechocystissp. | 5 | Bacteria ("blue-green algae") | |||
| Mycoplasma pneumoniae | 6 | Bacteria (Gm-) | |||
|
Escherichia coli (Wisconsin, USA) |
7a | Bacteria (Gm-) | October,1997 |
||
|
Escherichia coli (Japan) |
Bacteria (Gm-) | (completed) |
|||
|
Methanobacterium thermoautotrophicum |
8 | Archaebacteria | |||
| Archaeoglobus fulgidus | 9 | Archaebacteria | |||
| Helicobacter pylori | 10 | Bacteria (Gm-) | |||
| Borrelia burgdorferi | 11 | Bacteria (Gm-) | |||
| Treponema pallidum | 12 | Bacteria (Gm-) | |||
| Bacillus subtilis | 13 | Bacteria (Gm+) | |||
| Pyrococcus horikoshii | 14 | Archaebacteria | |||
| Aquifex aeolicus | 15 | Eubacteria | |||
| Mycobacterium tuberculosis | 16 | Bacteria (Gm+) |
|||
| Treponema pallidum | 17 | Bacteria (Gm-) |
1999: A List of Sequenced Genomes
(30 so far, as of 1 November, 1999)
The "C-value" paradox"
Although the number of genomes being sequenced is increasing rapidly, one has to this into perspective - the genomes of organisms fall very roughly into four different classes:
| Organism group | Size (bp) | No. sequenced |
| viruses | ~1000 bp - 70,000 bp | 534 |
| bacteria | ~500,000 - 8,000,000 bp | 30 |
| "simple" eukaryotes |
~12,000,000 - 270,000,000 bp | 2 |
| "complex" eukaryotes most animals and some plants |
~700,000,000 - ~10,000,000,000 bp |
0 |
| "other" eukaryotes plants and amoeba |
~10,000,000,000 - 670,000,000,000 bp | 0 |
Discussion
Why does amoeba have more than 200x as much DNA as humans?
Think about it for a discussion in class. I have a possible explanation, although I'm not sure anyone really knows the answer to this, to be honest.
This brings us to the first question on the quiz:
Answers to the self-test quiz which you are supposed to do BEFORE the lecture:
1. The short answer - a very long time. About 2.4x1012 years.
That's about 160 times longer than the estimated age of the universe!
2. The piece of paper would be quite thick - it would reach outside the earth's
atmosphere and beyond the orbit of the planet Mars.
Link to last year's introductory lecture

Part 2: DNA Symmetry Elements and DNA Structures
Background:
Introduction to DNA symmetry elements DNA is like Coca-cola Historical background - fiber diffraction vs. X-ray crystallography Families of DNA helices A Brief Introduction to Alternative Conformations of DNA
From a DNA sequence perspective, there are 4 types of repeats:
Direct Repeats
Simple Tandem Repeats
(Longer)Tandem Repeats
Direct (non-tandem)
Phased Repeats
Inverted Repeats
Mirror Repeats
Everted Repeats

Anatomy of chromosomes - there are four important parts in metaphase chromosomes (telomeres, centromeres, and heterochromatin & euchromatin):

There are two types of chromatin:


Centromeric DNA
Figure 6.25 from Hartl & Jones, "GENETICS - Principles and Analysis", fourth edition (1998).There are certain DNA sequences that are associated with the centromeres of chromosomes. Knowledge of this was essential in the construction of Yeast Artificial Chromosomes (or YACs, as they're usually called).
Figure 6.15 from Hartl & Jones, "GENETICS - Principles and Analysis", fourth edition (1998).
![]()
Here's an oversimplified view of the attachement of the kinetochore, which consists of several hundred microtubles bound together.
![]()
Telomeric DNA
Repetitive DNAtelomeric DNA can fold back on itself. This is necessary to allow for DNA synthesis (this isn't a problem for circular chromosomes!).
The ends of the chromosomes gets shorter every time the cells divide, because part of the bases are used to template off of themselves. Thus, after every round of replication, the chromosome gets a bit shorter. This is kind of like "planned obscelence", where the cells basically have so many divisions and then they fall apart. However (fortunately!) the cells have a mechanism for extending the length of the telomeres - the name of the enzyme is TELOMERASE. It has been found that in many cases cancer cells have a mutation such that the telomerase gene is overexpressed, thus allowing the cells to "live forever". Early results from clinical trials show that by specifically inhibiting the activity of the telomerase protein, they can slow or completely stop the growth of many types of cancer. More recently, the idea of using telomerase gene therapy to prevent people from getting old has received much attention in the media.Figure 6.26 from Hartl & Jones (page 250).
![]()
![]()
Highly repetitive DNADispersed - e.g., Alu familyMiddle repetitive DNA
about 300 bp long
500,000 copies in humans
(about 5% of the human genome)
dispersed throughout the chromosomes
Localised highly repetitive sequences
about 2-10 bp long
present in millions of copies, often in large blocks
(about 6% of the human genome)
associated with heterochromatin
usually very high A+T content
makes up more than 40% of the human genome
position varies due to transposable elements
Includes the following types of sequences:
microsatellite DNA Dinucleotide repeats
Trinucleotide repeats
- associated with many diseases (e.g., Fragile X, muscular distrophy)
| Sequence motif | Possible structure | Biological function |
| (C3TA2)n | 4-stranded DNA | Telomeres |
| (ACA5GAGTGT3CA2...)n | Curved DNA | associated with Centromeres (171 bp alphoid repeat) |
| (A3-5N5-7)n | Curved DNA | promoter regions |
| (R)n, where n > 250 bp | A-DNA stable intramolecular triplex DNA |
transposons homologous recombination |
| (ttcca)n, where n ~ 1,000,000 bp | A-DNA stable intramolecular triplex DNA |
human y chromosome |
| (RY)n | Z-DNA (>50% GC rich) Cruciforms Slipped-mispair |
induce mRNA editing deletions (in bacteria) mutagenesis |
| recA triple-stranded DNA | homologous recombination | |
| Intermolecular triplex Intramolecular Triple-strands |
recombination replication |
|
| cruciforms | deletions (in bacteria) insertion sequences |
|
| parallel stranded DNA | unknown stabilisation of telomeres(?) |
Sinden,R.R., Pearson,C.E., Potaman,V.N., Ussery,D.W., "DNA: STRUCTURE and FUNCTION", Advances in Genome Biology, 5A:1-141, (1998).
Calladine,C.R., Drew,H.R., "Understanding DNA: The Molecule and How It Works", (2nd edition, Academic Press, San Diego, 1997).
"Official" CBS Bioinformatics links
(an on-line and updated version of Chapter 12 from the Baldi & Brunak book)
HMS Beagle report on NCBI Bioinformatics sites - this is a good place for a molecular biologist to start!
The Human Genome Project Information page - this is put out by Los Alamos National Labs, and is updated regularly.
Last modified Tuesday, 28 September, 1999 by David Ussery