I am using kmeans to cluster my data, for the produced result I have a plan.
I wanted to relabel the samples based on ordered centres. Consider following example :
a = c("a","b","c","d","e","F","i","j","k","l","m","n")
b = c(1,2,3,20,21,21,40,41,42,4,23,50)
mydata = data.frame(id=a,amount=b)
result = kmeans(mydata$amount,3,nstart=10)
Here is the result :
clus$cluster
2 2 2 3 3 3 1 1 1 2 3 1
clus$centers
1 43.25
2 2.50
3 21.25
mydata = data.frame(mydata,label =clus$cluster)
mydata
id amount label
1 a 1 2
2 b 2 2
3 c 3 2
4 d 20 3
5 e 21 3
6 F 21 3
7 i 40 1
8 j 41 1
9 k 42 1
10 l 4 2
11 m 23 3
12 n 50 1
What I am looking for is sorting the centres and producing the labels accordingly:
1 2.50
2 21.25
3 43.25
and label the samples going to:
1 1 1 2 2 2 3 3 3 1 2 3
and the result should be :
id amount label
1 a 1 1
2 b 2 1
3 c 3 1
4 d 20 2
5 e 21 2
6 F 21 2
7 i 40 3
8 j 41 3
9 k 42 3
10 l 4 1
11 m 23 2
12 n 50 3
I think it is possible to do it by, order the centres and for each sample taking the index of minimum distance of samples with centres as the label of that cluster.
Is there another way that R can do it automatically ?