Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Exercise - Searching for Regulatory Cis-elements



In this exercise you should try a couple of methods for detecting possible cis-elements in promoter regions.

  1. The first method is a gibbs sampler method. It can detect small patterns common to a set of promoter sequences. The patterns does not necessary have to be perfectly conserved in order for the gibbs sampler to detect it

  2. The second method/program saco_patterns searches for perfectly conserved patterns.
    This program has two different modes.

    1. A Kolmogorov-Smirnoff mode. In this mode saco_patterns searches for patterns for which the occurrence correlate with the order of the promoters in the input set. This mode typically requires that the input promoter sequences have been sorted accordingly to the expression pattern of the respective genes.

    2. The alternative mode is the hypergeometric mode, where saco_patterns looks for patterns that are over represented in a positive set of promoters relative to a negative set of promoters.
Here you find a demo server that will allow you to try out these programs.

Input data
From a gene expression study comparing the Arabidopsis constitutive defense mutant mpk4 to wild type plants, we have indications that at least 17 genes are over-expressed in the mutant. In order to analyze the promoter regions of these genes, upstream regions were extracted from the Arabidopsis genome sequence. The upstream regions can be found here: 17 Arabidopsis promoters

Saco-patterns
Saco_patterns has two statistical options: Kolmogorov-Smirnoff statistics or hypergeometric statistics.
  • Which statistic is meaningful to use with the 17 Arabidopsis promoters dataset? (advanced help)
  • Try the relevant one searching for different pattern lengths
Gibbs sampler
Now, try to use the Gibbs sampler tool looking for 6bp patterns; you may have to play around with the settings to make it run appropriately (see below).

The gibbs sampler options
Run: the number of times that the gibbs sampler starts over again, with new initial random patterns.
Iterations: the number of iterations that the sampler must do before stopping.
Pseudo counts: well just ignore that one.
  • Do you find the same pattern every time you run the gibbs sampler?
  • Compare the Gibbs sampler results to the saco_patterns result?
Second data set
Here is a file containing 6000 Arabidopsis promoters sorted by their level of over expression in an mpk4 mutant relative to a wild type plant, i.e. promoters placed in the beginning of the file are strongly over-expressed in the mutant.
  • Which of the two mode options for saco_patterns; Kolmogorov-Smirnoff statistics or hypergeometric statistics is meaningful to use for the analysis of this data set?
  • Try the relevant one.
  • In Arabidopsis a series of transcription factors (WKRY factors) are known to bind sequences similar to TTGACT, did you find anything like that in these data sets?
MPK4 was shown to phosphorylate two WRKY transcription factors (WRKY25 & WRKY33) in vitro Andreasson et al. (2005)


GETTING HELP

Server problems: