While working with the DecisionTreeClassifier I visualized it using graphviz, and I have to say, to my astonishment, it seems it takes categorical data and uses it as continuous data.
All my features are categorical and for example you can see the following tree (please note that the first feature, X[0], has 6 possible values 0, 1, 2, 3, 4, 5: From what I found here the class uses a tree class which is a binary tree, so it is a limitation in sklearn.
Anyone knows a way that I am missing to use the tree categorically? (I know it is not better for the task but as I need categories currently I am using one hot vectors on the data).
EDIT: a sample of the original data looks like this:
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 c1 c2 c3
0 C S O 1 2 1 1 2 1 2 0 0 0
1 D S O 1 3 1 1 2 1 2 0 0 0
2 C S O 1 3 1 1 2 1 1 0 0 0
3 D S O 1 3 1 1 2 1 2 0 0 0
4 D A O 1 3 1 1 2 1 2 0 0 0
5 D A O 1 2 1 1 2 1 2 0 0 0
6 D A O 1 2 1 1 2 1 1 0 0 0
7 D A O 1 2 1 1 2 1 2 0 0 0
8 D K O 1 3 1 1 2 1 2 0 0 0
9 C R O 1 3 1 1 2 1 1 0 0 0
where X[0] = f1 and I encoded strings to integers as sklearn does not accept strings.