





Chapter
10a
Gene
Expression:
Transcription







Brief Outline
1.
The flow of Genetic Information
2.
Synthesizing Proteins from the Instructions of DNA
3.
The Genetic Code
4.
RNA: Intermediary in Protein Synthesis
1.
The flow of Genetic Information:
DNA
->
RNA ->
protein
How
does the sequence of a strand of DNA correspond to the amino acid sequence
of a protein? This concept is explained by
The Central
Dogma of Molecular Biology:
The Relationship
between Genes and Proteins
-
Most genes encode
the information for the synthesis of a protein
-
The sequence of
bases in DNA codes for the sequence of amino acids in proteins
Shown below is an Illustration
of the transcription of DNA to RNA to protein which forms the backbone
of molecular biology.
LEGEND
-
DNA codes for the production
of RNA.
-
RNA codes for the production
of protein.
-
Protein does not code for
the production of protein, RNA or DNA.
-
The end.
Or in the words of Francis
Crick:
Once
information has passed into protein, it cannot get out again.
This was taken from Genetech's homepage:
However,
the "Central Dogma" has had to be revised a bit. It turns out that
you CAN go back from RNA to DNA, and that RNA can also make copies of itself.
It is still not possible to go from Proteins back to RNA or DNA, and no
known mechanism has yet been demonstrated for proteins making copies of
themselves.
Try it for youself on the "DNA
Workshop" (from PBS).
Click
HERE for a link to nice historical review of The Central Dogma.
2.
Synthesizing Proteins from the Instructions of DNA
-
Genetic information flows in a
cell from:
-
DNA
->RNA->
Protein
-
In a prokaryotic cell,
this process happens at the same time:
-

-

-
However, in an eukaryotic
cell, the transcription & translation occur in different places:
-

-
3.
The Genetic Code
-

-
The Genetic Code uses three bases
to specify each amino acid
4.
RNA: Intermediary in Protein Synthesis
Why would
the cell want to have an intermediate between DNA and the proteins it encodes?
The
DNA can then stay pristine and protected, away from the caustic chemistry
of the cytoplasm.
Gene
information can be amplified by having many copies of an RNA made from
one copy of DNA.
Regulation
of gene expression can be effected by having specific controls at each
element of the pathway between DNA and proteins. The more elements there
are in the pathway, the more opportunities there are to control it in different
circumstances.
RNA has
the same primary structure as DNA. It consists of a sugar-phosphate backbone,
with nucleotides attaches to the 1' carbon of the sugar. The differences
between DNA and RNA are that:
1. RNA has a hydroxyl group on the 2' carbon of the sugar (thus, the difference
between deoxyribonucleic acid and ribonucleic acid).
2. Instead of using the nucleotide thymine, RNA uses another nucleotide
called uracil:
Because of the extra hydroxyl group on the sugar, RNA is too bulky to
form a a stable double helix. RNA exists as a
molecule. However, regions of double helix can form where there is some
base pair complementation (U and A , G and C), resulting in hairpin
loops. The RNA molecule with its hairpin loops is said to have a secondary
structure.
In addition, because the RNA molecule is not restricted to a rigid double
helix, it can form many different tertiary structures. Each RNA
molecule, depending on the sequence of its bases, can fold into a stable
three-dimensional structure.
From
http://motif.stanford.edu/thesis/tRNA.html.
-
Transcription produces RNA molecules
that are complimentary copies of one strand of DNA
-
Three types of RNA cooperate in
protein synthesis

The Genetic Code
How does an mRNA specify amino acid sequence? The answer lies in the genetic
code. It would be impossible for each amino aciud to be specified by one
nucleotide, because there are only 4 nucleotides and 20 amino acids. Similarly,
two nucleotide combinations could only specify 16 amino acids. The final
conclusion is that each amino acid is specified by a particular combination
of three nucleotides, called a codon:
Note the degeneracy of the genetic code. Each amino acid
might have up to six codons that specify it. It is also interesting to
note that different organisms have different frequencies of codon usage.
A giraffe might use CGC for arginine much more often than CGA, and the
reverse might be true for a sperm whale. Another interesting point is that
some species vary from the codon association described above, and use different
codons fo different amino acids. In general, however, the code depicted
can be relied upon.
How do tRNAs recognize to which codon to bring an amino acid?
The tRNA has an anticodon on its mRNA-binding end that is complementary
to the codon on the mRNA. Each tRNA only binds the appropriate amino acid
for its anticodon.
From http://motif.stanford.edu/thesis/tRNA.html.
hyperbio@mit.edu
Central Dogma, Part
1: Transcription
link
to Kimball biology page.
How
does the sequence information from DNA get transferred to mRNA so that
it can be carried to the ribosomes in the cytoplasm? This process, called
transcription is highly analogous to DNA replication. Of course,
there are different effectors, or proteins, that direct transcription.
Primary among these is the RNA polymerase holoenzyme, an agglomeration
of many different factors that together direct the synthesis of mRNA on
a DNA template.

As mentioned
above, transcription () is
divided into three parts:
1. Initiation
of Transcription
RNA polymerase
must be able to recognize the beginning of a gene so that it knows where
to start synthesizing an mRNA. It is directed to the start site of transcription
by one of its subunits' affinity to a particular DNA sequence that appears
at the beginning of genes. This sequence is called a promoter. It
is a unidirectional sequence on one strand of the DNA that tells the RNA
polymerase both where to start and in which direction (that is, on which
strand) to continue synthesis. The bacterial promoter almost always contains
some version of the following elements:
The
two sequences shown in red are known as the "-35" (TTGACA)
and "-10" (TATAAT)
sites, based on their positions from the start of transcription.
These two sequences represent the CONSENSUS, based on comparison of several
different sequences aligned at the transcription start site. Another
way of representing this consensus is by the application of information
theory to sequence analysis. One currently used method is "sequence
logos", (this is based on "Shannon information", for those of you who are
interested - see Schneider,
T.M., Stepehns,R.M., "Sequence logos: a new way to display Consensus Sequences",
Nucleic Acids Research, 18:6097-6100, (1990).)
The sequence logo, based on the promoter region of 167 different genes,
(aligned by their transcriptional start site) is shown below:
The
sequence logo for the -10 "TATA" box for 60 human promoters, aligned on
the TATA box, is shown below:
2. Elongation
of Transcription
The
RNA polymerase then stretches open the double helix at that point in the
DNA and begins synthesis of an RNA strand complementary to one of the strands
of DNA. We call the strand from which it copies the antisense or
template strand, and the other strand, to which it is identical, the sense
or coding strand.
The
RNA polymerase recruits rNTPs (ribonucleic nucleotides triphosphates) in
the same way that DNA polymerase recruits dNTPs. However, since synthesis
is single stranded and only proceeds in the 5' to 3' direction, there is
no need for Okazaki fragments.
It
is important to note that synthesis once again proceeds in a unidirectional
fashion, because of the reasons outlined in the previous section.
3. Termination
of Transcription
How does
RNA polymerase know when to stop transcribing a gene? This system has been
elucidated in prokaryotes. It is important to know that since there is
no nucleus in prokaryotes, ribosomes can begin making protein from an mRNA
immediately upon its synthesis. At the end of a gene, the sequence of the
mRNA allows it to form a hairpin loop, which blocks the ribosome. The ribosome
falls off the mRNA, and that is the termination signal recognized by the
RNA polymerase. As soon as the ribosome falls off the mRNA, the RNA polymerase
falls off the DNA and transcription ceases.
Gene Expression: Transcription
The majority of genes are
expressed as the proteins they encode. The process occurs in two steps:
-
Transcription
= DNA -> RNA
-
Translation
= RNA -> protein
Taken together, they make
up the "central dogma" of biology: DNA -> RNA -> protein. Here is an overview.
This page examines the
first step:
Gene Transcription: DNA
-> RNA
DNA serves as the template
for the synthesis of RNA much as it does for its own replication.
The Steps
-
several protein transcription
factors bind to promoter sites, usually on the 5' side of the
gene to be transcribed
-
an enzyme, RNA polymerase,
binds to the complex of transcription factors
-
working together, they
open the DNA double helix
-
RNA polymerase proceeds
down one strand moving in the 3' -> 5' direction
-
as it does so, it assembles
ribonucleotides (supplied as triphosphates, e.g., ATP) into a strand
of RNA
-
each ribonucleotide is
inserted into the growing RNA strand following the rules of base pairing.
Thus for each C encountered on the DNA strand, a G is inserted in the RNA;
for each G, a C; and for each T, an A. However, each A on the DNA guides
the insertion of the pyrimidine uracil (U, from uridine triphosphate,
UTP). There is no T in RNA.
-
synthesis of the RNA proceeds
in the 5' -> 3' direction.
-
as each nucleoside triphosphate
is brought in to add to the 3' end of the growing strand, the two terminal
phosphates are removed
Note that at any place
in a DNA molecule, either strand may be serving as the template; that is,
some genes "run" one way, some the other (and in a few remarkable cases,
the same segment of double helix contains genetic information on both strands!).
In all cases, however, RNA polymerase proceeds along a strand in its 3'
-> 5' direction.
Types
of RNA
Several types of RNA are
synthesized:
-
messenger RNA (mRNA).
This will later be translated into a polypeptide.
-
ribosomal RNA (rRNA).
This will be used in the building of ribosomes: machinery for synthesizing
proteins by translating mRNA.
-
transfer RNA (tRNA).
RNA molecules that carry amino acids to the growing polypeptide.
-
small nuclear RNA
(snRNA). DNA transcription of the genes for mRNA, rRNA, and tRNA
produces large precursor molecules ("primary transcripts") that
must be processed within the nucleus to produce the functional molecules
for export to the cytosol. Some of these processing steps are mediated
by snRNAs.
Ribosomal RNA (rRNA)
There are 4 kinds. In eukaryotes,
these are
-
18S rRNA. One of
these molecules, along with some 30 different protein molecules, is used
to make the small subunit of the ribosome.
-
28S, 5.8S, and 5S rRNA.
One each of these molecules, along with some 45 different proteins, are
used to make the large subunit of the ribosome.
The name given each type
of rRNA reflects the rate at which the molecules sediment in the ultracentrifuge.
The larger the number, the larger the molecule (but not proportionally).
The 28S, 18S, and 5.8S
molecules are produced by the processing of a single primary transcript
from a cluster of identical copies of a single gene. The 5S molecules are
produced from a different cluster of identical genes.
Transfer RNA (tRNA)
There are some 32 different
kinds of tRNA in a typical eukaryotic cell.
-
each is the product of
a separate gene
-
they are small (~4S), containing
73-93 nucleotides
-
many of the bases in the
chain pair with each other forming sections of double helix
-
the unpaired regions form
3 loops
-
each kind of tRNA carries
(at its 3' end) one of the 20 amino acids (thus most amino acids
have more than one tRNA responsible for them)
-
at one loop, 3 unpaired
bases form an anticodon
-
base pairing between the
anticodon and the complementary codon
on a mRNA molecule brings the correct amino acid into the growing polypeptide
chain. Further details of this process are described in the discussion
of translation.
Messenger RNA (mRNA)
Messenger RNA comes in
a wide range of sizes reflecting the size of the polypeptide it encodes.
Most cells produce small amounts of thousands of different mRNA molecules,
each to be translated into a peptide needed by the cell. Many mRNAs are
common to most cells, encoding "housekeeping" proteins needed by all cells
(e.g. the enzymes of glycolysis). Other mRNAs are specific for only certain
types of cells. These encode proteins needed for the function of that particular
cell (e.g., the mRNA for hemoglobin in the precursors of red blood cells).
Small Nuclear RNA (snRNA)
Approximately a dozen different
genes for snRNAs, each present in multiple copies, have been identified.
The snRNAs have various roles in the processing of the other classes of
RNA. For example, several snRNAs are part of the spliceosome that
participates in converting pre-mRNA into mRNA by excising the introns and
splicing the exons.
The RNA
polymerases
The RNA polymerases are
huge multi-subunit protein complexes. Three kinds are found in eukaryotes.
-
RNA polymerase I (Pol
I). It transcribes the rRNA genes for the precursor of the 28S,
18S, and 5.8S molecules. (and is the busiest of the RNA polymerases)
-
RNA polymerase II (Pol
II). It transcribes the mRNA and snRNA genes.
-
RNA polymerase III (Pol
III). It transcribes the 5S rRNA genes and all the tRNA genes.
RNA Processing: pre-mRNA -> mRNA
All the primary transcripts produced in the nucleus must undergo processing
steps to produce functional RNA molecules for export to the cytosol. We
shall confine ourselves to a view of the steps as they occur in the processing
of pre-mRNA to mRNA.
-
Synthesis of the cap. This is a stretch of three modified nucleotides
attached to the 5' end of the pre-mRNA.
-
Synthesis of the poly(A) tail. This is a stretch of adenine nucleotides
attached to the 3' end of the pre-mRNA.
-
Step-by-step removal of introns present in the pre-mRNA and splicing
of the remaining exons. This step is required because most eukaryotic
genes are split.
Split Genes
Most eukaryotic genes are split into segments. In decoding the open reading
frame of a gene for a known protein, one usually encounters periodic stretches
of DNA calling for amino acids that do not occur in the actual protein
product of that gene. Such stretches of DNA, which get transcribed into
RNA but not translated into protein, are called introns. Those stretches
of DNA that do code for amino acids in the protein are called exons.
Examples:
-
the gene for one type of collagen found in chickens is split into 52 separate
exons
-
the gene for dystrophin, which is mutated in boys with muscular
dystrophy, has 79 exons
-
even the genes for rRNA and tRNA are split.
The cutting and splicing of mRNA must be done with great precision. If
even one nucleotide is left over from an intron or one is removed from
an exon, the reading frame from that point on will be shifted, producing
new codons specifying a totally different sequence of amino acids from
that point to the end of the molecule (which often ends prematurely anyway
when the shifted reading frame generates a STOP codon).
The removal of introns and splicing of exons is done with the spliceosome.
This is a complex of several snRNA molecules and several proteins.
The introns in most pre-mRNAs begin with a GU and end with an AG. Presumably
these short sequences are essential for guiding the spliceosome.
Alternate Splicing
The processing of pre-mRNA for many proteins proceeds along various paths
in different cells or under different conditions. For example, early in
the differentiation of a B cell (a lymphocyte that synthesizes an antibody)
the cell first uses an exon that encodes a transmembrane domain that causes
the molecule to be retained at the cell surface. Later, the B cell switches
to using a different exon whose domain enables the protein to be secreted
from the cell as a circulating antibody molecule.
So, whether a particular segment of RNA will be retained as an exon
or excised as an intron can vary under different circumstances. Clearly
the switching to an alternate splicing pathway must be closely regulated.
Why split genes?
Perhaps during evolution, eukaryotic genes have been assembled from smaller,
primitive genes - today's exons. Some proteins, like the antibodies
mentioned in the previous section, are organized in a set of separate sections
or domains each with a special function to perform in the complete
molecule. Each domain is encoded by a separate exon. Having the different
functional parts of the antibody molecule encoded by separate exons makes
it possible to use these units in different combinations. Thus a set of
exons in the genome may be the genetic equivalent of the various modular
pieces in a box of "Lego" for children to assemble in whatever forms they
wish.
But the boundaries of other exons do not seem to correspond domain boundaries
of the protein. Furthermore, rRNA and tRNA genes are also split, and these
do not encode proteins. So perhaps some exons are simply "junk" DNA that
was inserted into the gene at some point in evolution without causing any
harm.
Summary
Gene expression occurs in two steps:
-
transcription of the information encoded in DNA into a molecule
of RNA (described here) and
-
translation of the information encoded in the nucleotides of mRNA
into a defined sequence of amino acids in a protein (discussed in Gene
Translation: RNA -> Protein).
Back to the Genetics Syllabus
Last modified on: 4 February, 2000 by Dave Ussery