Collapses a dataset from probes to gene symbols.
collapseDataset(exprsVals, platform = NULL, mapVector = NULL, oper = max, prefer = c("none", "up", "down"), singleProbeset = FALSE, returnProbes = FALSE, deProbes = NULL)
a matrix or data.frame of numeric values with rownames denoting the identifiers.
the microarray platform the data comes from for extracting the gene symbols
a named character vector with names specififying the current identifiers (probes matching the rownames of exprsVals) and the values of the vector specifying the gene symbols (or other identifier to collapse to).
the operation used to choose which probe when multiple probes map to the same gene. Default is max which will calculate the maximum of the average.
one of "none", "up", or "down", can be abbreviated.
a list with named vectors "up" and "down" giving the names of up and downregulated probes
If returnProbes is
TRUE, a list containing the collapsed dataset in $exprsVals and the probes chosen in $probeSets. Otherwise, if returnProbes is
FALSE, only the expression matrix is returned.
This function is designed to work for microarray data but can work for any sort of numeric matrix for which multiple rows need to be collapsed.
If singleProbeset is set to
FALSE, the default for compatability reasons but untested and not recommended, the values for each sample will be taken from the maximum across any probe that maps to that gene. This means that a gene's expression values may be a composition of values from different probes rather than a single probe. Most users will not need to use the `prefer` argument. If prefer is "up", when multiple deProbes match the same gene, the upregulated will be chosen. Similary for "down". Default is "none" and the probe with the `oper` (default max) will be chosen.
## Trivial Example showing basic functionality fakeExpr <- matrix(rnorm(50, mean=8, sd=1), ncol=5, nrow=10, dimnames=list(probes=paste("probe", 1:10, sep='_'), samples=paste("sample", LETTERS[1:5], sep='_'))) mv <- rep(paste("Gene", LETTERS[1:5], sep='_'), each=2) # mapVector names(mv) <- rownames(fakeExpr) res <- collapseDataset(fakeExpr, mapVector=mv, oper=max, singleProbeset=TRUE, # recommend setting singleProbeset to TRUE returnProbes=TRUE) res$probes ## between probe_1 and probe_2, probe_2 was chosen for Gene_A ## between probe_3 and probe_4, probe_4 was chosen for Gene_B ## etc. res$exprsVals # collapsed expression values ## only difference is in rownames, numbers are identical all.equal(res$exprsVals, fakeExpr[res$probes,])