Meanshift in scikit learn (python) doesn't und

I have a dataset which has 7265 samples and 132 features. I want to use the meanshift algorithm from scikit learn but I ran into this error:

Traceback (most recent call last):
  File "C:\Users\OJ\Dropbox\Dt\Code\visual\facetest\facetracker_video.py", line 130, in <module>
    labels, centers = getClusters(data,clusters)
  File "C:\Users\OJ\Dropbox\Dt\Code\visual\facetest\facetracker_video.py", line 34, in getClusters
    ms.fit(np.array(dataarray))
  File "C:\python2.7\lib\site-packages\sklearn\cluster\mean_shift_.py", line 280, in fit
    cluster_all=self.cluster_all)
  File "C:\python2.7\lib\site-packages\sklearn\cluster\mean_shift_.py", line 137, in mean_shift
    nbrs = NearestNeighbors(radius=bandwidth).fit(sorted_centers)
  File "C:\python2.7\lib\site-packages\sklearn\neighbors\base.py", line 642, in fit
    return self._fit(X)
  File "C:\python2.7\lib\site-packages\sklearn\neighbors\base.py", line 180, in _fit
    raise ValueError("data type not understood")
ValueError: data type not understood

My code:

dataarray = np.array(data)
bandwidth = estimate_bandwidth(dataarray, quantile=0.2, n_samples=len(dataarray))
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(dataarray)
labels = ms.labels_
cluster_centers = ms.cluster_centers_

If I check the datatype of the data variable I see:

print isinstance( dataarray, np.ndarray )
>>> True

The bandwidth is 0.925538333061 and the dataarray.dtype is float64

I'm using scikit learn 0.14.1

I can cluster with other algorithms in sci-kit (tried kmeans and dbscan). What am I doing wrong ?

EDIT:

The data can be found here: (pickle format) : http://ojtwist.be/datatocluster.p and : http://ojtwist.be/datatocluster.npz

标签： python machine-learning scikit-learn

1条回答

可以哭但决不认输i

2楼-- · 2019-08-03 00:22

That`s a bug in scikit project. It is documented here.

There is a float -> int casting during the fitting process that can crash in some cases (by making the seed points be placed at the corner of the bins instead in the center). There is some code in the link to fix the problem.

If you don't wanna get your hands into the scikit code (and maintain compatibility between your code with other machines) i suggest you normalize your data before passing it to MeanShift.

Try this:

>>>from sklearn import preprocessing
>>>data2 = preprocessing.scale(dataarray)

And then use data2 into your code. It worked for me.

If you don't want to do either solution, it is a great opportunity to contribute to the project, making a pull request with the solution :)

Edit: You probably want to retain information to "descale" the results of meanshift. So, use a StandardScaler object, instead using a function to scale.

Good luck!

0人赞添加讨论(0) 举报

Meanshift in scikit learn (python) doesn't und

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间