mboost-package {mboost} | R Documentation |
Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
Package: | mboost |
Type: | Package |
Version: | 2.1-1 |
Date: | 2011-11-28 |
License: | GPL-2 |
LazyLoad: | yes |
LazyData: | yes |
This package is intended for modern regression modelling and stands
in-between classical generalized linear and additive models, as for example
implemented by lm
, glm
, or gam
,
and machine-learning approaches for complex interactions models,
most prominently represented by gbm
and
randomForest
.
All functionality in this package is based on the generic
implementation of the optimization algorithm (function
mboost_fit
) that allows for fitting linear, additive,
and interaction models (and mixtures of those) in low and
high dimensions. The response may be numeric, binary, ordered,
censored or count data.
Both theory and applications are discussed by Buehlmann and Hothorn (2007).
UseRs without a basic knowledge of boosting methods are asked
to read this introduction before analyzing data using this package.
The examples presented in this paper are available as package vignette
mboost_illustrations
.
Note that the model fitting procedures in this package DO NOT automatically determine an appropriate model complexity. This task is the responsibility of the data analyst.
Version 2.0 comes with new features, is faster and more accurate
in some aspects. In addition, some changes to the user interface
were necessary: Subsetting mboost
objects changes the object.
At each time, a model is associated with a number of boosting iterations
which can be changed (increased or decreased) using the subset operator.
The center
argument in bols
was renamed
to intercept
. Argument z
renamed to by
.
The base-learners bns
and bss
are deprecated
and replaced by bbs
(which results in qualitatively the
same models but is computationally much more attractive).
New features include new families (for example for ordinal regression)
and the which
argument to the coef
and predict
methods for selecting interesting base-learners. Predict
methods are much faster now.
The memory consumption could be reduced considerably,
thanks to sparse matrix technology in package Matrix
.
Resampling procedures run automatically in parallel
if package multicore
is available.
The most important advancement is a generic implementation
of the optimizer in function mboost_fit
.
Torsten Hothorn Torsten.Hothorn@R-project.org,
Peter Buehlmann, Thomas Kneib, Matthias Schmid and
Benjamin Hofner
Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109 – 2113.
The main fitting functions include:
gamboost
for boosted (generalized) additive models,
glmboost
for boosted linear models and
blackboost
for boosted trees.
See there for more details and further links.
data("bodyfat") set.seed(290875) ### model conditional expectation of DEXfat given model <- mboost(DEXfat ~ bols(age) + ### a linear function of age btree(hipcirc, waistcirc) + ### a non-linear interaction of ### hip and waist circumference bbs(kneebreadth), ### a smooth function of kneebreadth data = bodyfat, control = boost_control(mstop = 100)) ### bootstrap for assessing `optimal' number of boosting iterations cvm <- cvrisk(model, papply = lapply) ### restrict model to mstop(cvm) model[mstop(cvm), return = FALSE] mstop(model) ### plot age and kneebreadth layout(matrix(1:2, nc = 2)) plot(model, which = c("age", "kneebreadth")) ### plot interaction of hip and waist circumference attach(bodyfat) nd <- expand.grid(hipcirc = h <- seq(from = min(hipcirc), to = max(hipcirc), length = 100), waistcirc = w <- seq(from = min(waistcirc), to = max(waistcirc), length = 100)) plot(model, which = 2, newdata = nd) detach(bodyfat) ### customized plot layout(1) pr <- predict(model, which = "hip", newdata = nd) persp(x = h, y = w, z = matrix(pr, nrow = 100, ncol = 100))