ssanova {gss}R Documentation

Fitting Smoothing Spline ANOVA Models

Description

Fit smoothing spline ANOVA models with cubic spline, linear spline, or thin-plate spline marginals for numerical variables. Factors are also accepted. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssanova(formula, type="cubic", data=list(), weights, subset,
        offset, na.action=na.omit, partial=NULL, method="v",
        varht=1, prec=1e-7, maxiter=30, ext=.05, order=2)

Arguments

formula Symbolic description of the model to be fit.
type Type of numerical marginals to be used. Supported are type="cubic" for cubic spline marginals, type="linear" for linear spline marginals, and type="tp" for thin-plate spline marginals.
data Optional data frame containing the variables in the model.
weights Optional vector of weights to be used in the fitting process.
subset Optional vector specifying a subset of observations to be used in the fitting process.
offset Optional offset term with known parameter 1.
na.action Function which indicates what should happen when the data contain NAs.
partial Optional extra fixed effect terms in partial spline models.
method Method for smoothing parameter selection. Supported are method="v" for GCV, method="m" for GML (REML), and method="u" for Mallow's CL.
varht External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified.
prec Precision requirement in the iteration for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved.
maxiter Maximum number of iterations allowed for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved.
ext For cubic spline and linear spline marginals, this option specifies how far to extend the domain beyond the minimum and the maximum as a percentage of the range. The default ext=.05 specifies marginal domains of lengths 110 percent of their respective ranges. Prediction outside of the domain will result in an error. Ignored if type="tp" is specified.
order For thin-plate spline marginals, this option specifies the order of the marginal penalties. Ignored if type="cubic" or type="linear" are specified.

Details

ssanova and the affiliated methods provide a front end to RKPACK, a collection of RATFOR routines for structural multivariate nonparametric regression via the penalized least squares method. The algorithms implemented in RKPACK are of the orders O(n^3) in execution time and O(n^2) in memory requirement. The constants in front of the orders vary with the complexity of the model to be fit.

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = c + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2". Through the specifications of the side conditions, these terms are uniquely defined. In the current implementation, f_{1} and f_{12} integrate to 0 on the x1 domain for cubic spline and linear spline marginals, and add to 0 over the x1 (marginal) sampling points for thin-plate spline marginals.

The penalized least squares problem is equivalent to a certain empirical Bayes model or a mixed effect model, and the model terms themselves are generally sums of finer terms of two types, the unpenalized fixed effects and the penalized random effects. Attached to every random effect there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

The method predict can be used to evaluate the sum of selected or all model terms at arbitrary points within the domain, along with standard errors derived from a certain Bayesian calculation. The method summary has a flag to request diagnostics for the practical identifiability and significance of the model terms.

Value

ssanova returns a list object of class "ssanova".
The method summary is used to obtain summaries of the fits. The method predict can be used to evaluate the fits at arbitrary points, along with the standard errors to be used in Bayesian confidence intervals. The methods residuals and fitted.values extract the respective traits from the fits.

Factors

Factors are accepted as predictors. When a factor has 3 or more levels, all terms involving it are treated as random effects, with the "level means" being shrunk towards each other. The shrinking is done differently for nominal and ordinal factors; see mkrk.factor for details.

Note

The independent variables appearing in formula can be multivariate themselves. In particular, ssanova(y~x,"tp",order=order) can be used to fit ordinary thin-plate splines in any dimension, of any order permissible, and with standard errors available for Bayesian confidence intervals. Note that thin-plate splines reduce to polynomial splines in one dimension.

For univariate marginals, the additive models using type="cubic" and type="tp" yield identical fit through different internal makes. For example, ssanova(y~x1+x2,"cubic") and ssanova(y~x1+x2,"tp") yield the same fit. The same is not true for models with interactions, however.

Mathematically, the domain (through ext for type="cubic") or the order (through order for type="tp") could be specified individually for each of the variables. Such flexibility is not provided in our implementation, however, as it would be more a source for confusion than a practical utility.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM.

Gu, C. (2000), Multivariate spline regression. In M.~G. Schimek (Ed.), Smoothing and Regression: Approaches, Computation and Application. New York: Wiley.

See Also

Methods predict.ssanova, summary.ssanova, and fitted.ssanova.

Examples

## Fit a cubic spline
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
cubic.fit <- ssanova(y~x,method="m")
## The same fit with different internal makes
tp.fit <- ssanova(y~x,"tp",method="m")
## Obtain estimates and standard errors on a grid
new <- data.frame(x=seq(min(x),max(x),len=50))
est <- predict(cubic.fit,new,se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y,col=1); lines(new$x,est$fit,col=2)
lines(new$x,est$fit+1.96*est$se,col=3)
lines(new$x,est$fit-1.96*est$se,col=3)

data(nox)
## Fit a tensor product cubic spline
nox.fit <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and nominal marginals
nox$comp<-as.factor(nox$comp)
nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and ordinal marginals
nox$comp<-as.ordered(nox$comp)
nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox)