Proximity Matrix in sklearn.ensemble.RandomForestC

I'm trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can't seem to find anything similar in the python scikit version of Random Forest. Does anyone know if there is an equivalent calculation for the python version?

标签： python scikit-learn random-forest

3条回答

Animai°情兽

2楼-- · 2019-03-11 07:42

We don't implement proximity matrix in Scikit-Learn (yet).

However, this could be done by relying on the apply function provided in our implementation of decision trees. That is, for all pairs of samples in your dataset, iterate over the decision trees in the forest (through forest.estimators_) and count the number of times they fall in the same leaf, i.e., the number of times apply give the same node id for both samples in the pair.

Hope this helps.

0人赞添加讨论(0) 举报

ゆ、 Hurt°

3楼-- · 2019-03-11 07:43

There is nothing currently implemented for this in python. I took a first try at it here. It would be great if somebody would be interested in adding these methods to scikit.

0人赞添加讨论(0) 举报

▲ chillily

4楼-- · 2019-03-11 07:46

Based on Gilles Louppe answer I have written a function. I don't know if it is effective, but it works. Best regards.

def proximityMatrix(model, X, normalize=True):      

    terminals = model.apply(X)
    nTrees = terminals.shape[1]

    a = terminals[:,0]
    proxMat = 1*np.equal.outer(a, a)

    for i in range(1, nTrees):
        a = terminals[:,i]
        proxMat += 1*np.equal.outer(a, a)

    if normalize:
        proxMat = proxMat / nTrees

    return proxMat   

from sklearn.ensemble import  RandomForestClassifier
from sklearn.datasets import load_breast_cancer
train = load_breast_cancer()

model = RandomForestClassifier(n_estimators=500, max_features=2, min_samples_leaf=40)
model.fit(train.data, train.target)
proximityMatrix(model, train.data, normalize=True)
## array([[ 1.   ,  0.414,  0.77 , ...,  0.146,  0.79 ,  0.002],
##        [ 0.414,  1.   ,  0.362, ...,  0.334,  0.296,  0.008],
##        [ 0.77 ,  0.362,  1.   , ...,  0.218,  0.856,  0.   ],
##        ..., 
##        [ 0.146,  0.334,  0.218, ...,  1.   ,  0.21 ,  0.028],
##        [ 0.79 ,  0.296,  0.856, ...,  0.21 ,  1.   ,  0.   ],
##        [ 0.002,  0.008,  0.   , ...,  0.028,  0.   ,  1.   ]])

0人赞添加讨论(0) 举报

Proximity Matrix in sklearn.ensemble.RandomForestC

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间