-->

k表示找到屈肘时肘情节是平滑的曲线(K means finding elbow when the e

2019-09-17 22:49发布

我试图绘制的使用下面的代码k表示肘部:

load CSDmat %mydata
for k = 2:20
    opts = statset('MaxIter', 500, 'Display', 'off');
    [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab
    [yy,ii] = min(D1');      %% assign points to nearest center

    distort = 0;
    distort_across = 0;
    clear clusts;
    for nn=1:k
        I = find(ii==nn);       %% indices of points in cluster nn
        J = find(ii~=nn);       %% indices of points not in cluster nn
        clusts{nn} = I;         %% save into clusts cell array
        if (length(I)>0)
            mu(nn,:) = mean(CSDmat(I,:));               %% update mean
            %% Compute within class distortion
            muB = repmat(mu(nn,:),length(I),1);
            distort = distort+sum(sum((CSDmat(I,:)-muB).^2));
            %% Compute across class distortion
            muB = repmat(mu(nn,:),length(J),1);
            distort_across = distort_across + sum(sum((CSDmat(J,:)-muB).^2));
        end
    end
    %% Set distortion as the ratio between the within
    %% class scatter and the across class scatter
    distort = distort/(distort_across+eps);

        bestD(k)=distort;
        bestC=clusts;
end
figure; plot(bestD);

的值bestD (簇方差内/集群方差之间)是

[
0.401970132754914
0.193697163350293
0.119427184084282
0.0872681777446508
0.0687948264457301
0.0566215549396577
0.0481117619129058
0.0420491551659459
0.0361696583755145
0.0320384092689509
0.0288948343304147
0.0262373245283877
0.0239462330460614
0.0218350896369853
0.0201506779033703
0.0186757121130685
0.0176258625858971
0.0163239661159014
0.0154933431470081
]

该代码是改编自Lihi Zelnik庄园,2005年3月,加州理工学院。

于集群方差之间群集内方差的积比率是具有膝盖是光滑像一个曲线,曲线图的平滑曲线bestD上面给出的数据。 我们如何找到这样的图的膝盖?

Answer 1:

我认为这是最好只用你作为一个优化参数“级失真内”:

%% Compute within class distortion
muB = repmat(mu(nn,:),length(I),1);
distort = distort+sum(sum((CSDmat(I,:)-muB).^2));

使用此不受 “distort_across”该值除以。 如果计算出的这个“衍生物”:

unexplained_error = within_class_distortion;
derivative = diff(unexplained_error);
plot(derivative)

该衍生物(K)告诉你的不明原因的错误多少减少了添加新集群。 我建议你停止加入群集时,这个错误的减少是不下十次,你得到的第一下降。

for (i=1:length(derivative))
    if (derivative(i) < derivative(1)/10)
         break
    end
end
k_opt = i+1;

其实方法来获取集群的最佳数目取决于应用程序,但我认为你可以使用获得这个建议k的一个很好的价值。



文章来源: K means finding elbow when the elbow plot is a smooth curve