roast {limma} | R Documentation |
Rotation gene set testing for linear models.
roast(iset=NULL, y, design, contrast=ncol(design), set.statistic="mean", gene.weights=NULL, array.weights=NULL, block=NULL, correlation, var.prior=NULL, df.prior=NULL, trend.var=FALSE, nrot=999) mroast(iset=NULL, y, design, contrast=ncol(design), set.statistic="mean", gene.weights=NULL, array.weights=NULL, block=NULL, correlation, var.prior=NULL, df.prior=NULL, trend.var=FALSE, nrot=999, adjust.method="BH", midp=TRUE)
iset |
index vector specifying which rows (probes) of |
y |
numeric matrix giving log-expression or log-ratio values for a series of microarrays, or any object that can coerced to a matrix including |
design |
design matrix |
contrast |
contrast for which the test is required. Can be an integer specifying a column of |
set.statistic |
summary set statistic. Possibilities are |
gene.weights |
optional numeric vector of weights for genes in the set. Can be positive or negative. For |
array.weights |
optional numeric vector of array weights. |
block |
optional vector of blocks. |
correlation |
correlation between blocks. |
var.prior |
prior value for residual variances. If not provided, this is estimated from all the data using |
df.prior |
prior degrees of freedom for residual variances. If not provided, this is estimated using |
trend.var |
logical, should a trend be estimated for |
nrot |
number of rotations used to estimate the p-values. |
adjust.method |
method used to adjust the p-values for multiple testing. See |
midp |
logical, should mid-p-values be used in instead of ordinary p-values when adjusting for multiple testing? |
This function implements the ROAST gene set test from Wu et al (2010).
It tests whether any of the genes in the set are differentially expressed.
The function can be used for any microarray experiment which can be represented by a linear model.
The design matrix for the experiment is specified as for the lmFit
function, and the contrast of interest is specified as for the contrasts.fit
function.
This allows users to focus on differential expression for any coefficient or contrast in a linear model.
If contrast
is not specified, the last coefficient in the linear model will be tested.
The arguments array.weights
, block
and correlation
have the same meaning as they for for the lmFit
function.
The arguments df.prior
and var.prior
have the same meaning as in the output of the eBayes
function.
If these arguments are not supplied, they are estimated exactly as is done by eBayes
.
The argument gene.weights
allows directions or weights to be set for individual genes in the set.
The gene set statistics "mean"
, "floormean"
, "mean50"
and msq
are defined by Wu et al (2010).
The different gene set statistics have different sensitivities to small number of genes.
If set.statistic="mean"
then the set will be statistically significantly only when the majority of the genes are differentially expressed.
"floormean"
and "mean50"
will detect as few as 25% differentially expressed.
"msq"
is sensitive to even smaller proportions of differentially expressed genes, if the effects are reasonably large.
The output gives p-values three possible alternative hypotheses,
"Up"
to test whether the genes in the set tend to be up-regulated, with positive t-statistics,
"Down"
to test whether the genes in the set tend to be down-regulated, with negative t-statistics,
and "Mixed"
to test whether the genes in the set tend to be differentially expressed, without regard for direction.
roast
estimates p-values by simulation, specifically by random rotations of the orthogonalized residuals (Langsrud, 2005), so p-values will vary slightly from run to run.
To get more precise p-values, increase the number of rotations nrot
.
The p-value is computed as (b+1)/(nrot+1)
where b
is the number of rotations giving a more extreme statistic than that observed (Phipson and Smyth, 2010).
This means that the smallest possible p-value is 1/(nrot+1)
.
mroast
does roast tests for multiple sets, including adjustment for multiple testing.
By default, mroast
reports ordinary p-values but uses mid-p-values at the multiple testing stage.
Mid-p-values are probably a good choice when using false discovery rates (adjust.method="BH"
) but not when controlling the family-wise type I error rate (adjust.method="holm"
).
roast
produces an object of class "Roast"
.
This consists of a list with the following components:
p.value |
data.frame with columns |
var.prior |
prior value for residual variances. |
df.prior |
prior degrees of freedom for residual variances. |
mroast
produces a list of three matrices, each with a row for each set:
P.Value |
unadjusted p-values for the mixed, up and down alternative hypotheses |
Adj.P.Value |
adjusted p-values for each set and each hypothesis |
Active.Proportion |
proportion of active genes for each set and each hypothesis |
The default setting for the set statistic was changed in limma 3.5.9 (3 June 2010) from "msq"
to "mean"
.
Gordon Smyth and Di Wu
Goeman, JJ, and Buhlmann, P (2007). Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980-987.
Langsrud, O (2005). Rotation tests. Statistics and Computing 15, 53-60.
Phipson B, and Smyth GK (2010). Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology, Volume 9, Article 39.
Routledge, RD (1994). Practicing safe statistics with the mid-p. Canadian Journal of Statistics 22, 103-110.
Wu, D, Lim, E, Francois Vaillant, F, Asselin-Labat, M-L, Visvader, JE, and Smyth, GK (2010). ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26, 2176-2182. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq401?
roast
performs a self-contained test in the sense defined by Goeman and Buhlmann (2007).
For a competitive gene set test, see wilcoxGST
.
For a competitive gene set enrichment analysis using a database of gene sets, see romer.
An overview of tests in limma is given in 08.Tests.
y <- matrix(rnorm(100*4),100,4) design <- cbind(Intercept=1,Group=c(0,0,1,1)) # First set of 5 genes contains 3 that are genuinely differentially expressed iset1 <- 1:5 y[iset1,3:4] <- y[iset1,3:4]+3 # Second set of 5 genes contains none that are DE iset2 <- 6:10 roast(iset1,y,design,contrast=2) mroast(list(set1=iset1,set2=iset2),y,design,contrast=2)