This exercise will illustrate analysis of data produced by DNA arrays used for monitoring expression levels of whole cell mRNA populations.

The GenePublisher software used for the exercise integrates a large number of analysis methods and automatically produces a report of all the results. You will download such an automatically generated report and try to interpret the analysis results.

Background material

ExercisesThese are so-called M versus A plots; instead of plotting each probe on one chip against each probe on another, the scales are changed so they plot, for each probe, the logarithm of the ratio of expression between the two chips as a function of the logarithm of the mean of the expression of the two chips. Two identical chips would yield a straight, flat line through zero. Two comparable chips ideally have a straight, flat line through zero and a few probes off the fitted line identicating differential expression. Deviation of the line from zero reveals a need for normalization before the two chips can be compared, and deviation from a straight line reveals a need for non-linear normalization (different normalization factors for highly and weakly expressed genes).

In these figures all chips are compared to each other (up to a limit of 10 chips).

- Look at the MVA plots before normalization and the MVA plots
after normalization. Have the chips become more comparable after normalization?
How can you tell?

- Identify the tables of genes that show significant change in expression. Notice that there are two
tables, one for the upregulated and one for the downregulated genes. What is the
P-value of the top ranking gene? (Rank is based on P-value)
- Look in the text before the tables. How many false positive genes do we expect on the list? What is the expected false discovery rate?
- Look at the volcano plot (if any). How many genes from the shuffled analysis have a P-value below the chosen cutoff? How does that compare to the number of predicted false positive genes in the question above?
- Look at the volcano plot (if any). What is the largest fold change observed? Is that fold change significant?

- Look at the clustering of chips. Are the chips divided into groups that make sense
based on the experiment?
- Compare the chip clustering to the chip PCA. Does the PCA add any additional
information?

- Look at the Figure Hierarchical clustering in the "Clustering" section.
What is, in your opinion, a good number of clusters to divide the
genes into?
- Look at the Figure Optimization of number of clusters K.
What number of clusters did the computer find as the optimal
number of clusters to divide the genes into?
- Look at the Figure K-means clustering. Do the genes within each cluster look more like each other (in terms of expression profile) than they look like members of another cluster? If they do, the clustering method has performed as it should.

- Look at the results from the three different promoter analysis methods.
What is, in your opinion, the most significant result (if any)?

- Do you find any similarities in functions between members of a cluster?