Say I have
head(kyphosis)
inTrain <- sample(1:nrow(kyphosis), 45, replace = F)
TRAIN_KYPHOSIS <- kyphosis[inTrain,]
TEST_KYPHOSIS <- kyphosis[-inTrain,]
(kyph_tree <- rpart(Number ~ ., data = TRAIN_KYPHOSIS))
How to get the terminal node from the fitted object for each observation in TEST_KYPHOSIS
?
How do I get a summary, such as the deviance and the predicted value from the terminal node which each test observation maps to?
One option is to convert the
rpart
object to an object of classparty
from thepartykit
package. That provides a general toolkit for dealing with recursive partytions. The conversion is simple:(For exact reproducibility run the code from your question with
set.seed(1)
prior to running my code.)For objects of this class there are somewhat more flexible methods for
plot()
,predict()
,fitted()
, etc. For example,plot(kyph_party)
yields a more informative display than the defaultplot(kyph_tree)
. Thefitted()
method extracts a two-columndata.frame
with the fitted node numbers and the observed responses on the training data.With this you can easily compute any quantity you are interested in, e.g., the means, median, or residual sums of squares within each node.
Instead of the simple
tapply()
you can use any other function of your choice to compute the tables of grouped statistics.Now to learn which observation from the test data
TEST_KYPHOSIS
to which node in the tree you can simply use thepredict(..., type = "node")
method:rpart
actually has this functionality but it's not exposed (strangely enough, it's a rather obvious requirement).And then: