About this document ...
Reverse engineering of regulatory networks is a relatively new
discipline within Bioinformatics. It relies upon the assumption that
given enough data on actual genetic expression levels, we can deduce
how genes are regulated. For this purpose it is practical to divide
microarray data into two groups, steady state and time series.
In a time series experiment samples are taken at different time
intervals from a single cell culture during growth. For each of these
samples microarray data can be produced revealing the changes in
expression levels over time. Using a mathmatical model it is now
theoretically possible to reveal the underlying regulation. In reality,
however, it is very difficult to obtain useful networks from time
series, because the whole thing relies on the time intervals being
selected in accordance with the delay between the expression of the
regulator and the effect it has on the regulated genes. Since very
little is typically known about such delays, designing an experiment
which takes them into account is currently not possible. Furthermore,
since this problem is related to the acutal data, and not the model
used for reverse engineering, simply changing the model won't help
(much). Also, limitations in the experimental conditions means that it
is difficult to obtain good quality data for more than approximately 20
time points, which are too few for most reverse engineering models. For
these reasons this exercise will not include analysis of time series
data.
In a steady state experiment several closely related cell cultures
are grown under very similar conditions, with expression data being
collected only at one specific time point, usually during steady
growth. Changes in expression levels are then identified by comparing
across the cell cultures. A typical setup involves comparing a single
gene knock-out mutant to the wild-type grown on the same media. The
closer the two cultures are to each other, the better, since this
eliminates noise from biological processes which are not related to the
knocked-out gene. Ideally only those genes regulated either directly or
indirectly (perhaps through another regulator) by the product of the
gene which was knocked-out will show any significant change in
expression level. The most realiable reverse engineered networks today
are always obtained from steady state data, and it's this kind of data
we'll focus on in the exercise.
For the purpose of this exercise we will concentrate on the
TnrA-GlnA regulatory system (sometimes known as the TnrA-GlnR system)
in B. subtilis
which is responsible for the celular levels of glutamine used in
synthesizing amino acids and nucleotides. In absence of a good nitrogen
source, TnrA of the global TnrA-GlnA nitrogen regulatory system (see figure) starts activating its own transcription and the transcription of genes encoding products, which enables B. subtilis
to gain access to nitrogen through alternative pathways, hereby
ensuring the continuation of vegetative growth. It has been shown, that
TnrA activates transcription by physically binding to a specific site
referred to as the TnrA-box. Later studies have shown that TnrA also
acts as a repressor by binding to a TnrA-box positioned near the
transcription start site of gltA.
When the cell is growing on a good nitrogen source such as ammonium in
combination with glutamate, GlnA inactivates TnrA through a direct
protein-protein interaction, making TnrA unable to bind to the promoter
regions of the genes it regulates. Due to the autoregulation the
cellular concentration of TnrA is therefore very low under these
conditions. GlnR is a homolog of TnrA and GlnR has been shown to bind
to the TnrA-box, the similarity is especially high in the N-terminal
region where the DNA binding helix-turn-helix is located. GlnR
represses the expression of the glnRA operon, the ureABC operon, and tnrA.
The activity of GlnR has been shown also to be controlled by a signal
mediated by GlnA when the cell is growing in excess of nitrogen.
We have a dataset of MicroArray data available for three
knock-out mutants with several reproductions for each mutant. The three
knocked-out genes are tnrA, glnA and glnR, and we have prepared a datafile for each
experiment for you.
The datafiles are located in your DNA_regnet directory.
Alternatively, you can just use the links below and see the data in your browser:
TnrA.tab
GlnA.tab
GlnR.tab
The three datafiles are constructed similarily with 25 columns:
- Column 1:
- An unique identifier for each gene (the genenames).
- Column 2-13:
- Expression levels for the wild-type reference culture. The
reference culture for the glnA and glnR mutants was grown on glutamine,
while the tnrA reference was grown on glutamate.
- Column 14-25:
- Expression levels for the knock-out mutant (TnrA, GlnA and
GlnR). The GlnA and GlnR mutants were grown on glutamine media, while
TnrA was grown on glutamate.
To identify significant differences in expression between the reference cultures
and the mutants, we'll use a standard t-test on each of the three files.
This is possible due to the high number of reproductions we have for
each comparison. We have prepared a web-server, which allows you to run
all commands easily. In fact, the server calls R for you and
automatically inputs the required commands. This will speed things up a
bit, since you'll need to run the test several times. The server can be
found here: Reg net server.
The P-value cutoff required for significance (P = 0.01 means
significant at a 99% confidence level). It is a good idea to use a low
P-value cutoff, as the ~ 40 genes in the datafiles are not randomly
selected from the B. subtilis genome.You can choose to use the Bonferroni correction to correct for multiple testing (see Chapter 3.5 page 24 in the Microarray book).
Explanation of output:
Results of t-test:
You'll notice that it now has one column
for each of the previous files containing either a "0", "1" or a "-1".
A "0" means no significant change was observed in that experiment.
Deduced paths:
- "X -| Y"
- A negative interaction occurs from X to Y (ie. X represses Y).
- "X -> Y"
- A positive interaction occurs from X to Y (ie. X enhances Y).
- "X |-| Y"
- A negative interaction occurs both from X to Y but also from Y to X (ie. they repress each other).
- "X -> Y Z"
- X enhances expression of both Y and Z. Y and Z are listed
together because their expression profiles are identical (ie. they
behave identically in all three experiments).
- "X -> ( Y )"
- X enhances expression of Y, but not necessarily by direct means.
Example:
A -| B -> ( C D )
This indicates that A represses expression of B, while B (when
expressed) enhances expression of both C and D, but not necessarily
through direct means. For example, it might be that C enhances
expression of D, while B only enhances expression of C. Hence B would
indirectly enhance expression of D, because it enhances C. Without data
from a mutant with C knocked-out we cannot be sure what happens.
- Questions
- (1) Why is it important to correct for multiple
testing? What difference does it make if you try running the test
without the Bonferroni correction?
-
(2) If you looked at the output files from the t-test
you may have noticed that many fewer genes were found to be
significantly regulated in the GlnR mutant than in the other two. Can
you find a biological reason for why identification of regulated genes
from knock-out mutants may be more difficult for GlnR than for TnrA or
GlnA?
(hint: Think about which genes GlnR regulates).
- Tricky Bonus Question
- (3) Can you guess why glnR seems to have a significantly higher expression level in the GlnR mutant (where it's supposed to be knocked-out)?
Open the link "grafical output" in a new tab/window.
- (4) Does the reverse engineered network resemble the biological one? Where does it deviate?
- (5) From the literature we know that glnR inhibits it's own transcription, but why does it also seem to inhibit glnA?
- (6) Although the programme tries to remove redundant pathways, there still seems to exist three seperate pathways from tnrA to guaA. Why do you think the programme kept all of them?
- (7) When we knock out glnA we can measure a change in expression at the mRNA level of several genes, among them nrgA. This change could have occurred because GlnA binds to the promoter region of nrgA, but we also see that GlnA regulates tnrA which in turn is found to regulate nrgA. This is what is known as a feed-forward loop. Can GlnA bind directly to nrgA, or is the effect we see on nrgA when we knock out glnA
merely a consequence of GlnA's regulation of TnrA? With the current
experimental setup we cannot tell the difference. Could you design a
microarray experiment to investigate this? What kind of a mutant strain
do you need to reveal the true regulation?
- (8) There is something tricky about the nature of the TnrA-GlnA
regulatory system which makes reverse engineering harder than usual for
such a small system. Can you see what it is?
(hint: Try thinking about why the interaction from GlnA to nrgA in the feed-forward loop above wasn't eliminated by the redundancy reduction).
- (9) If you have time to spare and feel up to it, try re-running the exercise with different P-value cutoffs for the t-test. What effect does it have on the prediction? Can you improve upon the prediction?
Answers to questions
Exercise in Reverse Engineering of Regulatory Networks
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Written by Carsten Friis
Mon Apr 7 16:43:54 MDT 2003
Modified by Hanne Jarmer & Carsten Friis
January 2005
GETTING HELP