Analysis of DNA Microarray Data

This exercise will illustrate analysis of data produced by DNA arrays used for monitoring expression levels of whole cell mRNA populations.

The software used for the exercise integrates a large number of analysis methods and produces a report of all the results. You will download such a report and try to interpret the analysis results.

Background material

  1. Affymetrix web site

  2. Knudsen, S. (2002) A Biologist's Guide to Analysis of DNA Microarray Data. Wiley, New York.

  3. DNA Microarrays i Danmark


1. Normalization

Download an analysis report at the site specified by the teacher. Open the report using Adobe Acrobat. Go to the last pages of the Materials and Methods section where figures 1 and 2 show chip comparisons before and after normalization. These Figures contain all chip measurements and therefore can take a long time to plot! These are so-called M versus A plots; instead of plotting each probe on one chip against each probe on another, the scales are changed so they plot, for each probe, the logarithm of the ratio of expression between the two chips as a function of the logarithm of the mean of the expression of the two chips. Two identical chips would yield a straight, flat line through zero. Two comparable chips ideally have a straight, flat line through zero and a few probes off the fiited line identicating differential expression. Deviation of the line from zero reveals a need for normalization before the two chips can be compared, and deviation from a straight line reveals a need for non-linear normalization (different normalization factors for highly and weakly expressed genes).
  1. Look at Figure 1 (chip data before normalization) and Figure 2 (chip data after normalization). Have the chips become more comparable after normalization (is the fitted line closer to a straight line through zero)?

2. Cluster Analysis

Download an analysis report at the site specified by the teacher if you have not already done so. Open the report using Adobe Acrobat. Go to "Clustering" under "Results."
  1. Look at Figure 10 (hierarchical clustering) in the "Clustering" section. What is, in your opinion, a good number of clusters to divide the genes into?

  2. Look at Figure 11 (optimization of number of clusters K). What number of clusters did the computer find as the optimal number of clusters to divide the genes into?

  3. Look at Figure 12 (K-means clustering). Do the genes within each cluster look more like each other than they look like members of another cluster? If they do, the clustering method has performed as it should.

3. Further Analysis

If you have additional time, you can try to look at the function of genes that cluster closely together in the hierarchical clustering.
  1. Do you find any similarities in functions between members of a cluster?

This question can sometimes be easy to answer based on a quick look at the annotated function. Sometimes it requires a great deal of database browsing to find something in common. You can use the numbers referring to Table 1, the table of top ranking genes, to find out more about individual genes.
Last updated by Steen Knudsen, CBS, November 4, 2002