glmFit {edgeR} | R Documentation |
Fit a negative binomial generalized linear model (GLM) for each transcript (tag) with the unadjusted counts provided, a value for the dispersion parameter and, optionally, offsets and weights for different libraries or transcripts.
## S3 method for class 'DGEList' glmFit(y, design=NULL, dispersion=NULL, offset=NULL, weights=NULL, lib.size=NULL, start=NULL, method="auto", ...) glmLRT(y, glmfit, coef=ncol(glmfit$design), contrast=NULL)
y |
an object that contains the raw counts for each library (the measure of expression level); alternatively, a matrix of counts, or a |
design |
numeric matrix giving the design matrix for the GLM that is to be fit. Must be of full column rank. Defaults to a single column of ones, equivalent to treating the columns as replicate libraries. |
dispersion |
numeric scalar or vector providing the value for the dispersion parameter that is used in fitting the GLM for each transcript. Can be a common value for all tags, or a vector of values can provide a unique dispersion value for each tag. If |
offset |
numeric scalar, vector or matrix giving the offset that is to be included in the NB GLM for the transcripts. Only one of |
weights |
optional numeric matrix giving prior weights for the observations (for each library and transcript) to be used in the GLM calculations. Not supported by methods |
lib.size |
optional numeric vector providing the (effective) library size for each library (must have length equal to the number of columns, or libraries, in the matrix of counts). If |
start |
optional numeric matrix of initial estimates for the fitted coefficients |
method |
which fitting algorithm to use. Possible values are |
... |
other arguments are passed to lower-level functions, for example to |
glmfit |
a |
coef |
scalar or vector indicating the column(s) of |
contrast |
contrast vector for which the test is required, of length equal to the number of columns of |
Given a fixed value for the dispersion parameter, a negative binomial model can be fitted to the counts for each tag/transcript in a dataset. The function glmFit
calls the in-built function glm.fit
to fit the NB GLM for each tag. Once we have a fit for a given design matrix, glmLRT
can be run with a given coefficient or contrast specified and evidence for differential expression assessed using a likelihood ratio test. Tags can be ranked in order of evidence for differential expression, based on the p-value computed for each tag.
glmFit
produces an object of class DGEGLM
with the following components:
coefficients |
matrix of estimated coefficients from the NB model |
df.residual |
vector giving the residual degrees of freedom for each tag. In theory it can be different for different tags (if there are missing values), but in practice these will usually be identical for each tag. |
deviance |
vector giving the deviance from the NB model fit for each tag. |
design |
design matrix used in the NB model fit for each tag. |
offset |
scalar, vector or matrix giving the offset to use in the NB model for each tag. |
samples |
data frame providing information about the samples (libraries) in the experiment; taken from the object |
genes |
vector or data frame providing gene information for each tag; taken from the object |
dispersion |
scalar or vector giving the the value of the dispersion parameter used in each tag's NB model fit. |
lib.size |
vector of library sizes used in the model fit. |
weights |
matrix of final weights used in the NB model fits for each tag. |
fitted.values |
matrix of fitted values from the NB model for each tag. |
abundance |
vector of gene/tag abundances (expression level), on the log2 scale, computed from the mean count for each gene/tag after scaling count by normalized library size. |
glmLRT
produces an object of class DGELRT
with the following components:
table |
data frame (table) containing the abundance of each tag (log-concentration, |
coefficients |
matrix of coefficients for the full model defined by the |
dispersion.used |
scalar or vector of the dispersion value(s) used in the GLM fits and LR test. |
The DGELRT
object also contains all the elements of y
except for the table of counts (raw data) and the table of pseudo-counts (if applicable).
Davis McCarthy and Gordon Smyth
estimateGLMCommonDisp
, estimateGLMTrendedDisp
or estimateGLMTagwiseDisp
for estimating the negative binomial dispersion.
topTags
for displaying results from glmLRT
.
nlibs <- 3 ntags <- 100 dispersion.true <- 0.1 # Make first transcript respond to covariate x x <- 0:2 design <- model.matrix(~x) beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ntags-1))) mu.true <- 2^(beta.true %*% t(design)) # Generate count data y <- rnbinom(ntags*nlibs,mu=mu.true,size=1/dispersion.true) y <- matrix(y,ntags,nlibs) colnames(y) <- c("x0","x1","x2") rownames(y) <- paste("Gene",1:ntags,sep="") d <- DGEList(y) # Normalize d <- calcNormFactors(d) # Fit the NB GLMs fit <- glmFit(d, design, dispersion=dispersion.true) # Likelihood ratio tests for trend results <- glmLRT(d, fit, coef=2) topTags(results) # Estimate the dispersion (may be unreliable with so few tags) d <- estimateGLMCommonDisp(d, design) d$common.dispersion