voom {limma}R Documentation

Transform RNA-Seq Data Ready for Linear Modelling

Description

Transform count data to log2-counts per million, estimate the mean-variance relationship and use this to compute appropriate observational-level weights. The data are then ready for linear modelling.

Usage

voom(counts, design = NULL, lib.size = NULL, normalize.method = "none", plot = FALSE, ...)

Arguments

counts

either a numeric matrix containing raw counts or a DGEList object.

design

design matrix with rows corresponding to samples and columns to coefficients to be estimated. Defaults to the unit vector meaning that samples are treated as replicates.

lib.size

numeric vector containing total librazy sizes for each sample. If NULL, library sizes are calculated as column sums of counts.

normalize.method

normalization method to be applied to the log2-counts-per-million. Choices are as for the method argument of normalizeBetweenArrays when the data is single-channel.

plot

logical value indicating whether a plot of mean-variance trend is displayed.

...

other arguments are passed to lmFit.

Details

This function is intended to process RNA-Seq or ChIP-Seq data prior to linear modelling in limma.

voom is an acronym for mean-variance modelling at the observational level. The key concern is to estimate the mean-variance relationship in the data, then use this to compute appropriate weights for each observation. Count data almost show non-trivial mean-variance relationships. Raw counts show increasing variance with increasing count size, while log-counts typically show a decreasing mean-variance trend. This function estimates the mean-variance trend for log-counts, then assigns a weight to each observation based on its predicted variance. The weights are then used in the linear modelling process to adjust for heteroscedasticity.

In an experiment, a count value is observed for each tag in each sample. A tag-wise mean-variance trend is computed using lowess. The tag-wise mean is the mean log2 count with an offset of 0.5, across samples for a given tag. The tag-wise variance is the quarter-root-variance of normalized log2 counts per million values with an offset of 0.5, across samples for a given tag. Tags with zero counts across all samples are not included in the lowess fit. Optional normalization is performed using normalizeBetweenArrays. Using fitted values of log2 counts from a linear model fit by lmFit, variances from the mean-variance trend were interpolated for each observation. This was carried out by approxfun. Inverse variance weights can be used to correct for mean-variance trend in the count data.

Value

An EList object with the following components:

E

numeric matrix of normalized expression values on the log2 scale

weights

numeric matrix of inverse variance weights

design

numeric matrix of experimental design

lib.size

numeric vector of total library sizes

genes

dataframe of gene annotation, only if counts was a DGEList object

Author(s)

Charity Law and Gordon Smyth

See Also

normalizeBetweenArrays


[Package limma version 3.10.2 Index]