Alternative mappings

This is the companion website for a manuscript published in BMC-bioinformatics.

The adviced software requirements are R version 1.9.0 (or greater), and the Bioconductor package altcdfenvs.



R data objects:

The R data objects listed below expect the package altcdfenvs to be loaded:

library(altcdfenvs)

When running R, data objects can be saved locally and loaded with the function load, or loaded over the internet with the function loadURL. For example for abatch.pancreas.rda:

baseurl <- "http://www.cbs.dtu.dk/staff/laurent/download/maprefseq"
loadURL(paste(baseurl, "abatch.pancreas.rda", sep="/"))

Mappings:

Alternative probe set mappings to the official were performed using NCBI's reference sequences (AffyCdfEnv objects are avaible for the GeneChip type HG-U133A.
The manual for the package altcdfenvs contains step-by-step instructions for building an alternative mapping for any other GeneChip and any other set of reference sequences. The manual is distributed with the package, and is also available here

HG-U133A hgu133a.ncbirefseq.rda hgu133a.ncbirefseq.unique.rda
HG-U95Av2 hgu95av2.ncbirefseq.rda

A long term goal is to have a repository of curated alternative mappings for all types of GeneChips arrays.

Experimental data:

The experimental dataset is available as an instance of class AffyBatch (to be used within the R package affy): abatch.pancreas.rda.



Other supplementary material:

SDEGs:

The lists of the probe sets we found significantly differentially expressed:

official mapping signif.affy.txt
alternative mapping
(set Alt1 in the manuscript)
signif.alt.noalu.txt
alternative mapping (set Alt2 in the manuscript) signif.alt.unique.txt

Log-ratios:

The log-ratios obtained with the expression values in the sets Affy, Alt1 and Alt2 respectively show similar distributions (figure).

However an interseting pattern can be observered when plotting the number of reference sequences per reference sequence against the log-ratios for the set Alt1 (figure, figure and figure (hexagonal bining with different minimum count in a cell to show the pattern)). The largest spread for log-ratios is observed for probe sets constituted of 11 probe sets (as in the official mapping), then 22 probes per probe set and 33 probes per probe set. Multiples of the number of probes in official probe sets suggest that quite a few probe sets in the official mapping are redundant (several probe sets matching the same reference sequence). A similar pattern can be observed when plotting the number of probes per probe set against the p-values obtained with the t-test (figure).


Laurent - Aug 2004