here is an example mentionning that fitctree of matlab takes into account the features order ! why ?
load ionosphere % Contains X and Y variables
Mdl = fitctree(X,Y)
view(Mdl,'mode','graph');
X1=fliplr(X);
Mdl1 = fitctree(X1,Y)
view(Mdl1,'mode','graph');
Not the same model, thus not the same classification accuracy despite dealing with the same features ?
In your example,
X
contains 34 predictors. The predictors contain no names andfitctree
just refers to them by their column numbersx1, x2, ..., x34
. If you flip the table, the column number changes and therefore their name. Sox1 -> x34
.x2 -> x33
, etc..In for most nodes this does not matter because CART always divides a node by the predictor that maximises the impurity gain between the two child nodes. But sometimes there are multiple predictors which result in the same impurity gain. Then it just picks the one with the lowest column number. And since the column number changed by reordering the predictors, you end up with a different predictor at that node.
E.g. let's look at the marked split:
Original order (
mdl
): Flipped order (mdl1
):Up to this point always the same predictor and values have been chosen. Names changed due to order, e.g.
x5
in the old data =x30
in the new model. Butx3
andx6
are actually different predictors.x6
in the flipped order isx29
in the original order.A scatter plot between those predictors shows how this could happen:
Where blue and cyan lines mark the splits performed by
mdl
andmdl1
respectively at that node. As we can see, both splits yield child nodes with the same number of elements per label! Therefore CART can chose any of the two predictors, it will cause the same impurity gain.In that case it seems to just pick the one with the lower column number. In the non-flipped table
x3
is chosen instead ofx29
because3 < 29
. But if you flip the tables,x3
becomesx32
andx29
becomesx6
. Since6 < 32
you now end up withx6
, the originalx29
.Ultimately this does not matter - the decision tree of the flipped table is not better or worse. It only happens in the lower nodes where the tree starts to overfit. So you really don't have to care about it.
Appendix:
Code for scatter plot generation: