Collapses a dataset from probes to gene symbols.

collapseDataset(exprsVals, platform = NULL, mapVector = NULL, oper = max,
  prefer = c("none", "up", "down"), singleProbeset = FALSE,
  returnProbes = FALSE, deProbes = NULL)



a matrix or data.frame of numeric values with rownames denoting the identifiers.


the microarray platform the data comes from for extracting the gene symbols


a named character vector with names specififying the current identifiers (probes matching the rownames of exprsVals) and the values of the vector specifying the gene symbols (or other identifier to collapse to).


the operation used to choose which probe when multiple probes map to the same gene. Default is max which will calculate the maximum of the average.


one of "none", "up", or "down", can be abbreviated.


If TRUE, the operation applies to the average over all conditions and all values for a gene will come from one probeset. Otherwise, if FALSE, the operation applies to the probesets over all conditions and the values for a gene may come from different probe sets . Default is FALSE for compatability reasons but TRUE is recommended.


if TRUE, a list of the collapsed expression matrix and the probes are both returned (see return).


a list with named vectors "up" and "down" giving the names of up and downregulated probes


If returnProbes is TRUE, a list containing the collapsed dataset in $exprsVals and the probes chosen in $probeSets. Otherwise, if returnProbes is FALSE, only the expression matrix is returned.


This function is designed to work for microarray data but can work for any sort of numeric matrix for which multiple rows need to be collapsed.

If singleProbeset is set to FALSE, the default for compatability reasons but untested and not recommended, the values for each sample will be taken from the maximum across any probe that maps to that gene. This means that a gene's expression values may be a composition of values from different probes rather than a single probe. Most users will not need to use the `prefer` argument. If prefer is "up", when multiple deProbes match the same gene, the upregulated will be chosen. Similary for "down". Default is "none" and the probe with the `oper` (default max) will be chosen.


## Trivial Example showing basic functionality
fakeExpr <- matrix(rnorm(50, mean=8, sd=1), ncol=5, nrow=10,
                   dimnames=list(probes=paste("probe", 1:10, sep='_'),
                     samples=paste("sample", LETTERS[1:5], sep='_')))
mv <- rep(paste("Gene", LETTERS[1:5], sep='_'), each=2) # mapVector
names(mv) <- rownames(fakeExpr)
res <- collapseDataset(fakeExpr, mapVector=mv, oper=max,
                       singleProbeset=TRUE, # recommend setting singleProbeset to TRUE
## between probe_1 and probe_2, probe_2 was chosen for Gene_A
## between probe_3 and probe_4, probe_4 was chosen for Gene_B
## etc.

res$exprsVals                           # collapsed expression values

## only difference is in rownames, numbers are identical
all.equal(res$exprsVals, fakeExpr[res$probes,])