gam {mgcv}R Documentation

Generalized Additive Models using penalized regression splines and GCV

Description

Fits the specified generalized additive model to data. The GAM is represented using one dimensional penalized regression splines with smoothing parameters selected by GCV (and/or ordinary regression splines with fixed degrees of freedom).

Usage

 gam(formula,family=gaussian(),data=list(),weights=NULL,control=gam.control,scale=0)

Arguments

formula A GAM formula. This is exactly like the formula for a glm except that smooth terms can be added to the right hand side of the formula (and a formula of the form y ~ . is not allowed). Smooth terms are specified by expressions of the form: s(var,knots) where var is the covariate which the smooth is a function of and knots is the number of knots for the spline representing this smooth. knots must be a number, and not a variable (i.e 10, 45, 13 are all ok, n is not). If the number of knots is not specified then 10 are used.
The formula may also include terms like s(x,12|f), which specifies a regression spline which is not to be penalized and has 12 knots. Such regression splines obviously have a fixed number of degrees of freedom (11 in this example).
family This is a family object specifying the distribution and link to use on fitting etc. See glm and family for more details.
data A data frame containing the model response variable and covariates required by the formula. If this is missing then the frame from which gam was called is searched for the variables specified in the formula.
weights prior weights on the data.
control A list as returned by gam.control, with three user controllable elements: maxit controls maximum iterations, convergence tolerance is controlled by epsilon and the third item is trace.
scale If this is zero then GCV is used for all distributions except Poisson and binomial where UBRE is used with scale parameter assumed to be 1. If this is greater than 1 it is assumed to be the scale parameter/variance and UBRE is used. If scale is negative GCV is always used (for binomial models in particular, it is probably worth comparing UBRE and GCV results; for ``over-dispersed Poisson'' GCV is probably more appropriate than UBRE.)

Details

Each smooth model term is represented using a cubic penalized regression spline, or optionally an unpenalized regression spline. Knots of the spline are placed evenly throughout the covariate values to which the term refers: For example, if fitting 101 data with an 11 knot spline of x then there would be a knot at every 10th (ordered) x value. The use of penalized regression splines turns the gam fitting problem into a penalized glm fitting problem, which can be fitted using a slight modification of glm.fit : gam.fit. The penalized glm approach also allows smoothing parameters for all smooth terms to be selected simultaneously by GCV or UBRE. This is achieved as part of fitting by calling mgcv within gam.fit.

The parameterization used represents the spline in terms of its values at the knots. Connection of these values at neighbouring knots by sections of cubic polynomial constrainted to join at the knots so as to be continuous up to and including second derivative yields a natural cubic spline through the values at the knots (given two extra conditions specifying that the second derivative of the curve should be zero at the two end knots). Other parameterizations, such as b-splines or the basis that arises naturally from r.k.h.s. representation of the spline smoothing problem are equivalent, but the basis used here has the advantage that the parameters of the each spline term are easily interpretable.

Details of the GCV/UBRE minimization method are given in Wood (2000).

Value

The function returns an object of class "gam" which has the following elements:

coefficients the coefficients of the fitted model. Parametric coefficients are first, followed by coefficients for each spline term in turn.
residuals the deviance residuals for the fitted model.
fitted.values fitted model predictions of expected value for each datum.
family family object specifying distribution and link used.
linear.predictor fitted model prediction of link function of expected value for each datum.
deviance (unpenalized)
null.deviance deviance for single parameter model.
df.null null degrees of freedom
iter number of iterations of IRLS taken to get convergence.
weights final weights used in IRLS iteration.
prior.weights prior weights on observations.
y response data.
converged indicates whether or not the iterative fitting method converged.
sig2 estimated or supplied variance/scale parameter.
edf estimated degrees of freedom for each smooth.
boundary did parameters end up at boundary of parameter space?
sp smoothing parameter for each smooth.
df number of knots for each smooth (one more than maximum degrees of freedom).
nsdf number of parametric, non-smooth, model terms excluding the intercept.
Vp estimated covariance matrix for parameters.
xp knot locations for each smooth. xp[i,] are the locations for the ith smooth.
formula the model formula.
x parametric design matrix columns (including intercept) followed by the data that form arguments of the smooths.
call a mode call object containing the call to gam() that produced this gam object (useful for constructing model frames).

WARNING

The code does not check for rank defficiency of the model matrix -it will likely just fail instead!

Automatic smoothing parameter selection is not likely to work well when fitting models to very few response data.

Author(s)

Simon N. Wood snw@st-and.ac.uk

References

Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398

Wood (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. JRSSB 62(2):413-428

http://www.ruwpa.st-and.ac.uk/simon.html

See Also

predict.gam plot.gam

Examples

library(mgcv)
n<-200
sig2<-4
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
pi <- asin(1) * 2
f <- 2 * sin(pi * x0)
f <- f + exp(2 * x1) - 3.75887
f <- f + 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10 - 1.396
e <- rnorm(n, 0, sqrt(abs(sig2)))
y <- f + e
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3))
plot(b) 
# now fit GAM with 3df regression spline term and two penalized terms
b1<-gam(y~s(x0,4|f)+s(x1)+s(x2,15))
plot(b1)
# now simulate poisson data
g<-exp(f/5)
for (i in  1:length(f)) y[i]<-rpois(1,f[i])
b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson)
plot(b2)