Analysis of DNA Microarray Data
This exercise will illustrate analysis
of data produced by DNA arrays used for monitoring expression levels
of whole cell mRNA populations.
The software used for the exercise integrates a large number of analysis methods
and produces a report of all the results. You will download such a report
and try to interpret the analysis results.
Background material
 Affymetrix web site
 Knudsen, S. (2002) A Biologist's Guide to Analysis of DNA Microarray Data. Wiley, New York.
 DNA Microarrays i Danmark
Exercises
1. Normalization
Download an analysis report at the site specified by the teacher. Open the report
using Adobe Acrobat. Go to the last pages of the Materials and Methods section where
figures 1 and 2 show chip comparisons before and after normalization.
These Figures contain all chip measurements and therefore can take a long time to plot!
These are socalled
M versus A plots; instead of plotting each probe on one chip against each probe
on another, the scales are changed so they plot, for each probe, the logarithm of
the ratio of expression between the two chips as a function of the logarithm of
the mean of the expression of the two chips. Two identical chips would yield a
straight, flat line through zero. Two comparable chips ideally have a straight, flat
line through zero and a few probes off the fiited line identicating differential expression.
Deviation of the line from zero reveals a need for normalization before the
two chips can be compared, and deviation from a straight line reveals a need for
nonlinear normalization (different normalization factors for highly and weakly
expressed genes).
 Look at Figure 1 (chip data before normalization) and Figure 2 (chip data
after normalization). Have the chips become more comparable after normalization
(is the fitted line closer to a straight line through zero)?
2. Cluster Analysis
Download an analysis report at the site specified by the teacher if you have not already
done so. Open the report using Adobe Acrobat. Go to "Clustering" under "Results."
 Look at Figure 10 (hierarchical clustering) in the "Clustering" section.
What is, in your opinion, a good number of clusters to divide the
genes into?
 Look at Figure 11 (optimization of number of clusters K).
What number of clusters did the computer find as the optimal
number of clusters to divide the genes into?
 Look at Figure 12 (Kmeans clustering). Do the genes within each cluster look
more like each other than they look like members of another cluster? If they
do, the clustering method has performed as it should.
3. Further Analysis
If you have additional time, you can try to look at the function of genes that cluster closely
together in the hierarchical clustering.
 Do you find any similarities in functions between members of a cluster?
This question can sometimes be easy to answer based on a quick look at the annotated
function. Sometimes it requires a great deal of database browsing to find something in common.
You can use the numbers referring to Table 1, the table of top ranking genes, to find out
more about individual genes.
Last updated by Steen Knudsen, CBS, November 4, 2002
