The getTree
function in randomForest package in R displays the structure of the a particular tree used in the random forest.
Here is an example on the iris dataset
library(randomForest)
data(iris)
rf <- randomForest(Species ~ ., iris)
getTree(rf, 1)
This shows the output of tree #1 of 500:
left daughter right daughter split var split point status prediction
1 2 3 3 2.50 1 0
2 0 0 0 0.00 -1 1
3 4 5 4 1.65 1 0
4 6 7 4 1.35 1 0
5 8 9 3 4.85 1 0
6 0 0 0 0.00 -1 2
7 10 11 2 3.10 1 0
8 12 13 4 1.55 1 0
9 0 0 0 0.00 -1 3
10 0 0 0 0.00 -1 3
11 0 0 0 0.00 -1 2
12 14 15 2 2.55 1 0
13 0 0 0 0.00 -1 2
14 16 17 2 2.35 1 0
15 0 0 0 0.00 -1 3
16 0 0 0 0.00 -1 3
17 0 0 0 0.00 -1 2
The leaves are the nodes with 0 leaf daughter and 0 right daughter.
Is there a way I can get which instances (rows of the iris data set) are in those leaves?
Like Node 2 which is a leaf has instance 2,3,4 from iris dataset all classified as 1.
Any help will be much appreciated.
Building on this answer:
Extract a subset of tree from random forest model for prediction
There may be an access function I'm not aware of, but the following manual approach seems to work: