rsf2pmml {randomSurvivalForest} | R Documentation |
rsf2pmml
implements the Predictive Model Markup
Language specification for a randomSurvivalForest forest
object. In particular, this function gives the user the ability to save the geometry of a
forest as a PMML XML document.
rsf2pmml(object, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. |
The Predictive Model Markup Language is an XML based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. More information about PMML and the Data Mining Group can be found at http://www.dmg.org.
Use of PMML and rsf2pmml
requires the XML package. Be
aware that XML is a very verbose data format. Reasonably sized trees
and data sets can lead to extremely large text files. XML, while
achieving interoperability, is not an efficient data storage mechanism
in this case.
It is anticipated that rsf2pmml
will be used to export the
geometry of the forest to other PMML compliant applications, including
graphics packages that are capable of printing binary trees. In
addition, the user may wish to save the geometry of the forest for later
retrieval and prediction on new data sets using rsf2pmml
together with pmml2rsf
.
An object of class XMLNode
as that defined by the XML
package. This represents the top level, or root node, of the XML
document and is of type PMML.
One cautionary note is in order. The PMML representation of the randomSurvivalForest forest object is incomplete, in that the object needs to be massaged in order for prediction to be possible. This will be clear in the examples. This deficiency will be addressed in future releases of this package. However, it was felt that the current functionality was important enough and mature enough to warrant release in this version of the product.
Hemant Ishwaran hemant.ishwaran@gmail.com
Udaya B. Kogalur kogalurshear@gmail.com
http://www.dmg.org
xmlTreeParse
,
xmlRoot
,
saveXML
,
pmml2rsf
.
## Not run: # Example 1: Growing a forest, saving it as a PMML document, # restoring the forest from the PMML document, and using this forest to # perform prediction. library("XML") data(veteran, package = "randomSurvivalForest") veteran.out <- rsf(Surv(time, status)~., data = veteran, ntree = 5) veteran.forest <- veteran.out$forest veteran.pmml <- rsf2pmml(veteran.forest) # Save the document to disk. userFile = file("veteran.forest.xml") saveXML(veteran.pmml, userFile) close(userFile) # Read the just written document. veteran.pmml <- xmlRoot(xmlTreeParse("veteran.forest.xml")) partial.forest <- pmml2rsf(veteran.pmml) # The PMML forest object must be massaged before it can be used # for prediction as follows: veteran.restored.forest <- list( nativeArray=partial.forest$nativeArray, nativeFactorArray=partial.forest$nativeFactorArray, timeInterest=partial.forest$timeInterest, predictorNames=partial.forest$predictorNames, seed=partial.forest$seed formula=partialForest$formula, predictors=veteran.forest$predictors, time=veteran.forest$time, cens=veteran.forest$cens) # The actual time, censoring and prediction values of the data set # used to grow the forest are not contained in the PMML # representation of the forest. If the user has access to the original # datafile that was used to grow the forest, this information can be # easily recovered. The names corresponding to the time, censoring and # prediction data are all retained in the PMML representation of the forest. class(veteran.restored.forest) <- c("rsf", "forest") veteran.restored.out <- predict.rsf(veteran.restored.forest, test=veteran) ## End(Not run)