Ph.D. course in Biological Sequence Analysis and Protein Modelling

David Ussery
Thursday, 13 April, 2000



leaf

Visualisation of DNA Structures in Complete Genomes

-or-

Three SEVEN Views of the
Escherichia coli Genome.



Outline:

Part 1




Part 1 INTRODUCTION

Three Different Views of the Escherichia coli genome.


view # 1 - "mechanical properties" of the DNA helix.

E.coli structural atlas




    This includes the following:

  1. DNase I Sensitivity - this is scaled such that the black bands represent regions of the DNA that are more likely to be cut by DNase I.

  2. Intrinsic Curvature - this is a measure of local curvature of the DNA helix. The units are such that a larger number is more curved, and a value of "0" reflects DNA that has no curvature at all. The method used here is based on the "Curvature" programme and calculation scheme of Alexander Bolshoy.

  3. Helix Rigidity - this is actually calculated from the propeller twist angles found in crystal structures. There is a good correlation with propeller twist and helix rigidity.

  4. Helix Deformability - this is based on the analysis of several protein-DNA crystal structuresl; certain dinucleotides had a much greater range of "deformability" than others. (see Olson et al., 1998).

  5. Stacking Energy - this is a thermodynamic parameter, that can be experimentally measured. The values here are in kcal/mol, and are calculated using the values from Ornstein et al.

  6. Position Preference - this measure is from nucleosome positioning data from Andrew Travers.



note: DNase I sensitivity (the wheel inside of the annotation circle) and Position Preference (the wheel just outside of the annotation circle) are models based on tri-nucleotide values, whilst the other 4 measures use dinucleotide models.



DNA helix





view # 2 - "base composition" of the DNA sequence.

base-composition



This includes the following:
1. Individual base contributions - the 4 seperate circles make it possible to spot regions enriched for one particular base (e.g., A near the yagG gene, and G near the phnM gene. (Maybe I need to work on a better colour scheme - this is still under development, and I'm open to any ideas!)
2. Trinucleotide distribution - this is a measure of the deviation of a particular region from the average for the entire chromosome. It is also possible to compare the region against a different genome (e.g., cp. Archae vs. Bacteria).
3. AT skew and GC skew - this is calculated by the formula (G-C)/(G+C), over a window of 5000 G's (which for E. coli is roughly 10,000 bp). It should be obvious that in E. coli there's an obvious GC skew, but not an AT skew. Different organisms have different skews, and although it could in part be explained by codon preference usage, this is probably not the entire explanation. At any rate, this is a useful measure in distinguishing the replicores in many bacteria.



DNA helix





view # 3 - "DNA repeats"
DNA repeats







Combined view


DNA repeats

For more E.coli atlases (as well as other genomes), visit the " DNA Structural Atlas of E.coli" web page from our the CBS server!




E. coli gene expression Gene Expression in E. coli

Escherichia coli is probably the best characterised organism.

some numbers:

  • There are 4085 predicted genes in Escherichia coli strain K-12 isolate W3110.


  • There are 4289 predicted genes in Escherichia coli strain K-12 isolate MG1665.


  • There are about 5100 predicted genes in Escherichia coli strain O157:H7 isolate EDL933 (enterohemorrhagic pathogen).



  • Roughly 2600 genes have been found to be expressed in Escherichia coli strain K-12 cells, under standard laboratory growth conditions.


  • About 2100 spots can be seen on 2-D protein gels.



  • Very roughly 1000 different genes (only about 600 mRNA transcripts) are expressed at "detectable levels" in E. coli cells grown in LB media.



  • Only about 350 proteins exist at concentrations of > 100 copies per cell. (These make up 90% of the total protein in E.coli!)



  • Part 1 Four more Views of the Escherichia coli genome.

    1. The DNA Helix Atlas

    2. The CDS Atlas

    3. The Gene Expression Atlas

    4. The Chromatin Proteins Atlas(es)



    leaf

    So what's this good for? Who cares?



    Part 1 Part 2: What does it all mean? Is any of this at all useful??





      Some recent findings:
    1. Analysis of IS elements in pathogenic E. coli (e.g., pO157:H7).

    2. Cluster analysis of E. coli genes.

    3. Promoters and intergenic regions are different from coding regions in most bacteria.

    4. Different promoters have characteristic DNA structural properties.

    5. Helical periodicities unique to Bacteria vs. Archae.



    On the Biological "meanings" or usefulness of DNA symmetry elements


    link to Cookbook for this afternoon's lecture




    References


    • Carsten Friis, Lars Juhl Jensen, and David W. Ussery, "Visualisation of Pathogenicity Regions in Bacteria", manuscript submitted to Genetica.

    1. Pedersen,A.G., Jensen,L.J., Stærfeldt,H.H., Brunak,S. and Ussery,D.W. "A DNA Structural Atlas for E. coli." The Journal of Molecular Biology in press (April, 2000).

    2. Peder Worning, Lars Juhl Jensen, Karen E. Nelson, Søren Brunak, and David W. Ussery, "Structural analysis of DNA sequence: Evidence for lateral gene transfer in Thermotoga maritima ", Nucleic Acids Research, 28:706-709, (2000).
    3. Jensen,L.J., Friis,C., Ussery,D.W., "Three views of the microbial genomes", Research in Microbiology, 150:773-777, 1999.

    4. Ussery,D.W., Higgins,C.F., and Bloshoy,A., "Environmental Influences on DNA Curvature", J. Biomolecular Structure & Dynamics,16:811-823, (1999).

    5. Richard R. Sinden, Christopher E. Pearson, Vladimir N. Potoman, and David W. Ussery, "DNA: Structure and Function", Advances in Genome Biology, 5A:1-141, (1998).



    Coke For those who might be interested in learning more about DNA, visit my " DNA is like Coke"web page!




    Back to the Ph.D. course outline



    Back Back to the lecture notes page



    Last modified on: 12 April, 2000 by Dave Ussery