ssanova {gss} | R Documentation |
Fit smoothing spline ANOVA models with cubic spline, linear spline,
or thin-plate spline marginals for numerical variables. Factors are
also accepted. The symbolic model specification via formula
follows the same rules as in lm
.
ssanova(formula, type="cubic", data=list(), weights, subset, offset, na.action=na.omit, partial=NULL, method="v", varht=1, prec=1e-7, maxiter=30, ext=.05, order=2)
formula |
Symbolic description of the model to be fit. |
type |
Type of numerical marginals to be used. Supported are
type="cubic" for cubic spline marginals,
type="linear" for linear spline marginals, and
type="tp" for thin-plate spline marginals. |
data |
Optional data frame containing the variables in the model. |
weights |
Optional vector of weights to be used in the fitting process. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Optional offset term with known parameter 1. |
na.action |
Function which indicates what should happen when the data contain NAs. |
partial |
Optional extra fixed effect terms in partial spline models. |
method |
Method for smoothing parameter selection. Supported
are method="v" for GCV, method="m" for GML (REML),
and method="u" for Mallow's CL. |
varht |
External variance estimate needed for
method="u" . Ignored when method="v" or
method="m" are specified. |
prec |
Precision requirement in the iteration for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved. |
maxiter |
Maximum number of iterations allowed for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved. |
ext |
For cubic spline and linear spline marginals, this option
specifies how far to extend the domain beyond the minimum and
the maximum as a percentage of the range. The default
ext=.05 specifies marginal domains of lengths 110 percent
of their respective ranges. Prediction outside of the domain
will result in an error. Ignored if type="tp" is
specified. |
order |
For thin-plate spline marginals, this option specifies
the order of the marginal penalties. Ignored if
type="cubic" or type="linear" are specified. |
ssanova
and the affiliated methods
provide a
front end to RKPACK, a collection of RATFOR routines for structural
multivariate nonparametric regression via the penalized least
squares method. The algorithms implemented in RKPACK are of the
orders O(n^3) in execution time and O(n^2) in memory
requirement. The constants in front of the orders vary with the
complexity of the model to be fit.
The model specification via formula
is intuitive. For
example, y~x1*x2
yields a model of the form
y = c + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e
with the terms denoted by "1"
, "x1"
, "x2"
, and
"x1:x2"
. Through the specifications of the side conditions,
these terms are uniquely defined. In the current implementation,
f_{1} and f_{12} integrate to 0 on the x1
domain for cubic spline and linear spline marginals, and add to
0 over the x1
(marginal) sampling points for thin-plate
spline marginals.
The penalized least squares problem is equivalent to a certain empirical Bayes model or a mixed effect model, and the model terms themselves are generally sums of finer terms of two types, the unpenalized fixed effects and the penalized random effects. Attached to every random effect there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.
The method predict
can be used to evaluate the sum of
selected or all model terms at arbitrary points within the domain,
along with standard errors derived from a certain Bayesian
calculation. The method summary
has a flag to request
diagnostics for the practical identifiability and significance of
the model terms.
ssanova
returns a list object of class "ssanova"
.
The method summary
is used to obtain summaries of the
fits. The method predict
can be used to evaluate the
fits at arbitrary points, along with the standard errors to be used
in Bayesian confidence intervals. The methods
residuals
and fitted.values
extract the
respective traits from the fits.
Factors are accepted as predictors. When a factor has 3 or more
levels, all terms involving it are treated as random effects, with
the "level means" being shrunk towards each other. The shrinking is
done differently for nominal and ordinal factors; see
mkrk.factor
for details.
The independent variables appearing in formula
can be
multivariate themselves. In particular,
ssanova(y~x,"tp",order=order)
can be used to fit ordinary
thin-plate splines in any dimension, of any order permissible, and
with standard errors available for Bayesian confidence intervals.
Note that thin-plate splines reduce to polynomial splines in one
dimension.
For univariate marginals, the additive models using
type="cubic"
and type="tp"
yield identical fit through
different internal makes. For example,
ssanova(y~x1+x2,"cubic")
and ssanova(y~x1+x2,"tp")
yield the same fit. The same is not true for models with
interactions, however.
Mathematically, the domain (through ext
for
type="cubic"
) or the order (through order
for
type="tp"
) could be specified individually for each of the
variables. Such flexibility is not provided in our implementation,
however, as it would be more a source for confusion than a practical
utility.
Chong Gu, chong@stat.purdue.edu
Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM.
Gu, C. (2000), Multivariate spline regression. In M.~G. Schimek (Ed.), Smoothing and Regression: Approaches, Computation and Application. New York: Wiley.
Methods predict.ssanova
,
summary.ssanova
, and fitted.ssanova
.
## Fit a cubic spline x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x) cubic.fit <- ssanova(y~x,method="m") ## The same fit with different internal makes tp.fit <- ssanova(y~x,"tp",method="m") ## Obtain estimates and standard errors on a grid new <- data.frame(x=seq(min(x),max(x),len=50)) est <- predict(cubic.fit,new,se=TRUE) ## Plot the fit and the Bayesian confidence intervals plot(x,y,col=1); lines(new$x,est$fit,col=2) lines(new$x,est$fit+1.96*est$se,col=3) lines(new$x,est$fit-1.96*est$se,col=3) data(nox) ## Fit a tensor product cubic spline nox.fit <- ssanova(log10(nox)~comp*equi,data=nox) ## Fit a spline with cubic and nominal marginals nox$comp<-as.factor(nox$comp) nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox) ## Fit a spline with cubic and ordinal marginals nox$comp<-as.ordered(nox$comp) nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox)