GenePublisher Exercise
GenePublisher Exercise
This exercise will illustrate analysis
of data produced by DNA arrays used for monitoring expression levels
of whole cell mRNA populations.
The GenePublisher
software used for the exercise integrates a large number of analysis methods
and automatically produces a report of all the results. You will download such an
automatically generated report
and try to interpret the analysis results.
Background material
- Knudsen, S. (2004) Guide to Analysis of DNA Microarray Data. 2nd Ed Wiley, New York.
Exercises
1. Normalization
Download an analysis report. Open the report
using Adobe Acrobat. In the left window there is a navigation section (expand by clicking the +)
for easier navigation around the report. Go to the Normalization section under Results
and identify the MVA plots before and after normalization.
These are so-called M versus A plots; instead of plotting each probe on one chip against each probe
on another, the scales are changed so they plot, for each probe, the logarithm of
the ratio of expression between the two chips as a function of the logarithm of
the mean of the expression of the two chips. Two identical chips would yield a
straight, flat line through zero. Two comparable chips ideally have a straight, flat
line through zero and a few probes off the fitted line identicating differential expression.
Deviation of the line from zero reveals a need for normalization before the
two chips can be compared, and deviation from a straight line reveals a need for
non-linear normalization (different normalization factors for highly and weakly
expressed genes).
In these figures all chips are compared to each other (up to a limit of 10 chips).
- Look at the MVA plots before normalization and the MVA plots
after normalization. Have the chips become more comparable after normalization?
How can you tell?
2. Statistical analysis
- Identify the tables of genes that show significant change in expression. Notice that there are two
tables, one for the upregulated and one for the downregulated genes. What is the
P-value of the top ranking gene? (Rank is based on P-value)
- Look in the text before the tables. How many false positive genes do we expect on the
list? What is the expected false discovery rate?
- Look at the volcano plot (if any). How many genes from the shuffled analysis have
a P-value below the chosen cutoff? How does that compare to the number
of predicted false positive genes in the question above?
- Look at the volcano plot (if any). What is the largest fold change observed?
Is that fold change significant?
3. Clustering and PCA of chips
- Look at the clustering of chips. Are the chips divided into groups that make sense
based on the experiment?
- Compare the chip clustering to the chip PCA. Does the PCA add any additional
information?
4. Gene Cluster Analysis
Go to "Clustering of genes" under "Results."
- Look at the Figure Hierarchical clustering in the "Clustering" section.
What is, in your opinion, a good number of clusters to divide the
genes into?
- Look at the Figure Optimization of number of clusters K.
What number of clusters did the computer find as the optimal
number of clusters to divide the genes into?
- Look at the Figure K-means clustering. Do the genes within each cluster look
more like each other (in terms of expression profile) than they look like
members of another cluster? If they
do, the clustering method has performed as it should.
5. Promoter analysis
Go to "Promoter analysis" under "Results."
- Look at the results from the three different promoter analysis methods.
What is, in your opinion, the most significant result (if any)?
6. Further Analysis
If you have additional time, you can try to look at the function of genes that cluster closely
together in the hierarchical clustering.
- Do you find any similarities in functions between members of a cluster?
This question can sometimes be easy to answer based on a quick look at the annotated
function (shown in the tables of significant genes).
Sometimes it requires a great deal of database browsing to find something in common.
You can use the numbers referring to the tables of top ranking genes to find out
more about individual genes.
Last updated by Steen Knudsen, CBS, November 15, 2004