I was playing around with weka when I observed a minNum field in the RandomTree configuration. I read the description which said "The minimum total weight of the instances in a leaf". However, I couldn't really understand what it means.
I played around with that number, and I realized that when I increase it, the size of the tree thus generated reduces. I couldn't correlate as to why this is happening.
Any help/references will be appreciated.
This has to do with the minimum number of instances on a leaf node (which is often 2 by default in decision trees, like J48). The higher you set this parameter, the more general the tree will be since having many leaves with a low number of instances yields a too granular tree structure.
Here are two examples on the
iris
dataset, which shows how the-M
option might affect size of the resulting tree:As a sidenote, Random trees rely on bagging, which means there's a subsampling of attributes (K randomly chosen to split at each node); contrary to REPTree, however, there's no pruning (like in RandomForest), so you may end up with very noisy trees.