Exercise in Reverse Engineering of Regulatory Networks


  • About this document ...

    Introduction to Modelling of Regulatory Networks

    Reverse engineering of regulatory networks is a relatively new discipline within Bioinformatics. It relies upon the assumption that given enough data on actual genetic expression levels, we can deduce how genes are regulated. For this purpose it is practical to divide microarray data into two groups, steady state and time series.

    Time Series

    In a time series experiment samples are taken at different time intervals from a single cell culture during growth. For each of these samples microarray data can be produced revealing the changes in expression levels over time. Using a mathmatical model it is now theoretically possible to reveal the underlying regulation. In reality, however, it is very difficult to obtain useful networks from time series, because the whole thing relies on the time intervals being selected in accordance with the delay between the expression of the regulator and the effect it has on the regulated genes. Since very little is typically known about such delays, designing an experiment which takes them into account is currently not possible. Furthermore, since this problem is related to the acutal data, and not the model used for reverse engineering, simply changing the model won't help (much). Also, limitations in the experimental conditions means that it is difficult to obtain good quality data for more than approximately 20 time points, which are too few for most reverse engineering models. For these reasons this exercise will not include analysis of time series data.

    Steady State

    In a steady state experiment several closely related cell cultures are grown under very similar conditions, with expression data being collected only at one specific time point, usually during steady growth. Changes in expression levels are then identified by comparing across the cell cultures. A typical setup involves comparing a single gene knock-out mutant to the wild-type grown on the same media. The closer the two cultures are to each other, the better, since this eliminates noise from biological processes which are not related to the knocked-out gene. Ideally only those genes regulated either directly or indirectly (perhaps through another regulator) by the product of the gene which was knocked-out will show any significant change in expression level. The most realiable reverse engineered networks today are always obtained from steady state data, and it's this kind of data we'll focus on in the exercise.

    Reverse Engineering in B. subtilis

    For the purpose of this exercise we will concentrate on the TnrA-GlnA regulatory system (sometimes known as the TnrA-GlnR system) in B. subtilis which is responsible for the celular levels of glutamine used in synthesizing amino acids and nucleotides. In absence of a good nitrogen source, TnrA of the global TnrA-GlnA nitrogen regulatory system (see figure) starts activating its own transcription and the transcription of genes encoding products, which enables B. subtilis to gain access to nitrogen through alternative pathways, hereby ensuring the continuation of vegetative growth. It has been shown, that TnrA activates transcription by physically binding to a specific site referred to as the TnrA-box. Later studies have shown that TnrA also acts as a repressor by binding to a TnrA-box positioned near the transcription start site of gltA.

    When the cell is growing on a good nitrogen source such as ammonium in combination with glutamate, GlnA inactivates TnrA through a direct protein-protein interaction, making TnrA unable to bind to the promoter regions of the genes it regulates. Due to the autoregulation the cellular concentration of TnrA is therefore very low under these conditions. GlnR is a homolog of TnrA and GlnR has been shown to bind to the TnrA-box, the similarity is especially high in the N-terminal region where the DNA binding helix-turn-helix is located. GlnR represses the expression of the glnRA operon, the ureABC operon, and tnrA. The activity of GlnR has been shown also to be controlled by a signal mediated by GlnA when the cell is growing in excess of nitrogen.

    We have a dataset of MicroArray data available for three knock-out mutants with several reproductions for each mutant. The three knocked-out genes are tnrA, glnA and glnR, and we have prepared a datafile for each experiment for you.

    The datafiles are located in your DNA_regnet directory. Alternatively, you can just use the links below and see the data in your browser:

    The three datafiles are constructed similarily with 25 columns:

    Column 1:
    An unique identifier for each gene (the genenames).
    Column 2-13:
    Expression levels for the wild-type reference culture. The reference culture for the glnA and glnR mutants was grown on glutamine, while the tnrA reference was grown on glutamate.
    Column 14-25:
    Expression levels for the knock-out mutant (TnrA, GlnA and GlnR). The GlnA and GlnR mutants were grown on glutamine media, while TnrA was grown on glutamate.

    Comparing the reference to the sample

    To identify significant differences in expression between the reference cultures and the mutants, we'll use a standard t-test on each of the three files. This is possible due to the high number of reproductions we have for each comparison. We have prepared a web-server, which allows you to run all commands easily. In fact, the server calls R for you and automatically inputs the required commands. This will speed things up a bit, since you'll need to run the test several times. The server can be found here: Reg net server.

    The P-value cutoff required for significance (P = 0.01 means significant at a 99% confidence level). It is a good idea to use a low P-value cutoff, as the ~ 40 genes in the datafiles are not randomly selected from the B. subtilis genome.You can choose to use the Bonferroni correction to correct for multiple testing (see Chapter 3.5 page 24 in the Microarray book).

    Explanation of output:

    Results of t-test:

    You'll notice that it now has one column for each of the previous files containing either a "0", "1" or a "-1". A "0" means no significant change was observed in that experiment.

    Deduced paths:

    "X -| Y"
    A negative interaction occurs from X to Y (ie. X represses Y).
    "X -> Y"
    A positive interaction occurs from X to Y (ie. X enhances Y).
    "X |-| Y"
    A negative interaction occurs both from X to Y but also from Y to X (ie. they repress each other).
    "X -> Y Z"
    X enhances expression of both Y and Z. Y and Z are listed together because their expression profiles are identical (ie. they behave identically in all three experiments).
    "X -> ( Y )"
    X enhances expression of Y, but not necessarily by direct means.


    A -| B -> ( C D )

    This indicates that A represses expression of B, while B (when expressed) enhances expression of both C and D, but not necessarily through direct means. For example, it might be that C enhances expression of D, while B only enhances expression of C. Hence B would indirectly enhance expression of D, because it enhances C. Without data from a mutant with C knocked-out we cannot be sure what happens.


    • (1) Why is it important to correct for multiple testing? What difference does it make if you try running the test without the Bonferroni correction?
    • (2) If you looked at the output files from the t-test you may have noticed that many fewer genes were found to be significantly regulated in the GlnR mutant than in the other two. Can you find a biological reason for why identification of regulated genes from knock-out mutants may be more difficult for GlnR than for TnrA or GlnA? (hint: Think about which genes GlnR regulates).

    Tricky Bonus Question
    (3) Can you guess why glnR seems to have a significantly higher expression level in the GlnR mutant (where it's supposed to be knocked-out)?

    Visualizing the Network

    Open the link "grafical output" in a new tab/window.


    • (4) Does the reverse engineered network resemble the biological one? Where does it deviate?
    • (5) From the literature we know that glnR inhibits it's own transcription, but why does it also seem to inhibit glnA?
    • (6) Although the programme tries to remove redundant pathways, there still seems to exist three seperate pathways from tnrA to guaA. Why do you think the programme kept all of them?
    • (7) When we knock out glnA we can measure a change in expression at the mRNA level of several genes, among them nrgA. This change could have occurred because GlnA binds to the promoter region of nrgA, but we also see that GlnA regulates tnrA which in turn is found to regulate nrgA. This is what is known as a feed-forward loop. Can GlnA bind directly to nrgA, or is the effect we see on nrgA when we knock out glnA merely a consequence of GlnA's regulation of TnrA? With the current experimental setup we cannot tell the difference. Could you design a microarray experiment to investigate this? What kind of a mutant strain do you need to reveal the true regulation?
    • (8) There is something tricky about the nature of the TnrA-GlnA regulatory system which makes reverse engineering harder than usual for such a small system. Can you see what it is?
      (hint: Try thinking about why the interaction from GlnA to nrgA in the feed-forward loop above wasn't eliminated by the redundancy reduction).
    • (9) If you have time to spare and feel up to it, try re-running the exercise with different P-value cutoffs for the t-test. What effect does it have on the prediction? Can you improve upon the prediction?

      Answers to questions

    • About this document ...

      Exercise in Reverse Engineering of Regulatory Networks

      This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

      Written by Carsten Friis
      Mon Apr 7 16:43:54 MDT 2003
      Modified by Hanne Jarmer & Carsten Friis
      January 2005


      Scientific problems: Carsten Friis