-->

How do I interpret rpart splits on factor variable

2020-07-09 09:05发布

问题:

If the factor variable is Climate, with 4 possible values: Tropical, Arid, Temperate, Snow, and a node in my rpart tree is labeled as "Climate:ab", what is the split?

回答1:

I assume you use standard way to plot tree which is

plot(f)
text(f)

As you can read in help to text.rpart, argument pretty on default factor variables are presented as letters, so a means levels(Climate)[1] and it means that on left node are observation with Climate==levels(Climate)[1] and on right the others.

You could print levels directly using

plot(f)
text(f, pretty=1)

but I recommend using draw.tree from maptree package:

require(maptree)
draw.tree(f)

I used fake data to do plots:

X <- data.frame(
    y=rep(1:4,25),
    Climate=rep(c("Tropical", "Arid", "Temperate", "Snow"),25)
)
f <- rpart(y~Climate, X)