GenePublisher Exercise

GenePublisher Exercise

This exercise will illustrate analysis of data produced by DNA arrays used for monitoring expression levels of whole cell mRNA populations.

The GenePublisher software used for the exercise integrates a large number of analysis methods and automatically produces a report of all the results. You will download such an automatically generated report and try to interpret the analysis results.

Background material

  1. Knudsen, S. (2004) Guide to Analysis of DNA Microarray Data. 2nd Ed Wiley, New York.


1. Normalization

Download an analysis report. Open the report using Adobe Acrobat. In the left window there is a navigation section (expand by clicking the +) for easier navigation around the report. Go to the Normalization section under Results and identify the MVA plots before and after normalization.

These are so-called M versus A plots; instead of plotting each probe on one chip against each probe on another, the scales are changed so they plot, for each probe, the logarithm of the ratio of expression between the two chips as a function of the logarithm of the mean of the expression of the two chips. Two identical chips would yield a straight, flat line through zero. Two comparable chips ideally have a straight, flat line through zero and a few probes off the fitted line identicating differential expression. Deviation of the line from zero reveals a need for normalization before the two chips can be compared, and deviation from a straight line reveals a need for non-linear normalization (different normalization factors for highly and weakly expressed genes).

In these figures all chips are compared to each other (up to a limit of 10 chips).

  1. Look at the MVA plots before normalization and the MVA plots after normalization. Have the chips become more comparable after normalization? How can you tell?

2. Statistical analysis

  1. Identify the tables of genes that show significant change in expression. Notice that there are two tables, one for the upregulated and one for the downregulated genes. What is the P-value of the top ranking gene? (Rank is based on P-value)

  2. Look in the text before the tables. How many false positive genes do we expect on the list? What is the expected false discovery rate?
  3. Look at the volcano plot (if any). How many genes from the shuffled analysis have a P-value below the chosen cutoff? How does that compare to the number of predicted false positive genes in the question above?
  4. Look at the volcano plot (if any). What is the largest fold change observed? Is that fold change significant?

3. Clustering and PCA of chips

  1. Look at the clustering of chips. Are the chips divided into groups that make sense based on the experiment?

  2. Compare the chip clustering to the chip PCA. Does the PCA add any additional information?

4. Gene Cluster Analysis

Go to "Clustering of genes" under "Results."
  1. Look at the Figure Hierarchical clustering in the "Clustering" section. What is, in your opinion, a good number of clusters to divide the genes into?

  2. Look at the Figure Optimization of number of clusters K. What number of clusters did the computer find as the optimal number of clusters to divide the genes into?

  3. Look at the Figure K-means clustering. Do the genes within each cluster look more like each other (in terms of expression profile) than they look like members of another cluster? If they do, the clustering method has performed as it should.

5. Promoter analysis

Go to "Promoter analysis" under "Results."
  1. Look at the results from the three different promoter analysis methods. What is, in your opinion, the most significant result (if any)?

6. Further Analysis

If you have additional time, you can try to look at the function of genes that cluster closely together in the hierarchical clustering.
  1. Do you find any similarities in functions between members of a cluster?

This question can sometimes be easy to answer based on a quick look at the annotated function (shown in the tables of significant genes). Sometimes it requires a great deal of database browsing to find something in common. You can use the numbers referring to the tables of top ranking genes to find out more about individual genes.
Last updated by Steen Knudsen, CBS, November 15, 2004