find.interaction {randomSurvivalForest}R Documentation

Find Interactions Between Pairs of Variables

Description

Find pairwise interactions between variables.

Usage

   find.interaction(object, predictorNames = NULL,
          method = c("maxsubtree", "vimp")[1], sorted = TRUE, 
          npred = NULL, subset = NULL, nrep = 1, rough = FALSE,
          importance = c("randomsplit", "permute")[1],
          seed = NULL, do.trace = FALSE, ...)

Arguments

object

An object of class (rsf, grow) or (rsf, forest).

predictorNames

Character vector of names of target x-variables. Default is to use all variables.

method

Method of analysis: maximal subtree or VIMP. See details below.

sorted

Should variables be sorted?

npred

Use the first npred ordered variables. Default is to use all variables.

subset

Indices indicating which rows of the predictor matrix to be used (note: this applies to the object predictor matrix, predictors). Default is to use all rows.

nrep

Number of Monte Carlo replicates. Applies only when method="vimp".

rough

Should fast approximation be used? Applies only when method="vimp".

importance

Type of variable importance (VIMP). Applies only when method="vimp".

seed

Seed (negative integer) for random number generator.

do.trace

Logical. Should trace output be enabled? Integer values can also be passed. A positive value causes output to be printed each do.trace iteration. Applies only when method="vimp".

...

Further arguments passed to or from other methods.

Details

Using a previously grown forest, identify pairwise interactions for all pairs of variables from a specified list. There are two distinct approaches specified by the method option.

If method="maxsubtree", then a maximal subtree analysis is used. In this case, a matrix is returned where entries [i][i] are the normalized minimal depth of variable [i] relative to the root node (normalized w.r.t. the size of the tree) and entries [i][j] indicate the normalized minimal depth of a variable [j] w.r.t. the maximal subtree for variable [i] (normalized w.r.t. the size of [i]'s maximal subtree). Smaller [i][i] entries indicate predictive variables. Small [i][j] entries having small [i][i] entries are a sign of an interaction between variable i and j (note: the user should scan rows, not columns, for small entries). See Ishwaran et al. (2010) for more details.

If method="vimp", then a joint-VIMP approach is used. Two variables are paired and their paired VIMP calculated (refered to as Paired importance). The VIMP for each separate variable is also calculated. The sum of these two values is refered to as Additive importance. A large positive or negative difference between Paired and Additive indicates an association worth pursuing if the VIMP's for each variable are reasonably large. See Ishwaran (2007) for more details.

Computations might be slow depending upon the size of the data and the forest. In such cases, consider setting npred to a smaller number, or using rough=TRUE if method="vimp". If method="maxsubtree", consider using a smaller number of trees in the original grow call.

If nrep is greater than 1, the analysis is repeated nrep times and results averaged over the replications (applies only when method="vimp").

For competing risk data, maximal subtree analyses correspond to unconditional values (i.e., they are non-event specific). Setting method="vimp", however, yields pairwise interactions for both event and non-event specific settings.

Value

Invisibly, the interaction table (a list for competing risk data) or the maximal subtree matrix.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com

Udaya B. Kogalur kogalurshear@gmail.com

References

Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.

See Also

max.subtree, vimp.

Examples

## Not run: 
#------------------------------------------------------------------------
# Maximal subtree approach, top 8 predictors (PBC data).

data(pbc, package = "randomSurvivalForest") 
pbc.out <- rsf(Surv(days,status) ~ ., pbc, nsplit = 10)
find.interaction(pbc.out, npred = 8)

#------------------------------------------------------------------------
# VIMP approach (PBC data). 
# Use fast approximation to speed up computations.

data(pbc, package = "randomSurvivalForest") 
pbc.out <- rsf(Surv(days,status) ~ ., pbc, nsplit = 10)
find.interaction(pbc.out, method = "vimp", nrep=3, rough=T)

#------------------------------------------------------------------------
# Competing risks (WIHS data).

data(wihs, package = "randomSurvivalForest")
wihs.out <- rsf(Surv(time, status) ~ ., wihs, nsplit = 3, ntree = 200)
find.interaction(wihs.out, method = "vimp")

## End(Not run)

[Package randomSurvivalForest version 3.6.3 Index]