How can I get cluster number correspond to data us

2019-02-07 13:33发布

I clustered data by k-means clustering method, how can i get cluster number correspond to data using k-means clustering techniques in R? In order to get each record belongs to which cluster.

example 12 32 13 => 1. 12,13 2. 32

3条回答
我只想做你的唯一
2楼-- · 2019-02-07 13:44

@ Java questioner

You can access the cluster data as followed:

> data_clustered <- kmeans(data)
> data_clustered$cluster 

data_clustered$cluster is a vector with the length of the original number of records in data. Each entry is for the that row.

To get all the records belonging to cluster 1:

> data$cluster <- data_clustered$cluster 
> data_clus_1 <- data[data$cluster == 1,]

Number of clusters:

> max(data$cluster)

Good luck with your clustering

查看更多
何必那么认真
3楼-- · 2019-02-07 13:53

We like reproducible examples here on Stack Overflow. Otherwise we're just guessing.

I'll guess that you are using kmeans in the stats package.

I'll further guess you haven't read the documentation help(kmeans) which says:

Value:

  an object of class 'kmeans' which is a list with components:

   cluster: A vector of integers indicating the cluster to which each point is allocated.

There's an example in the help that shows you exactly how that works.

查看更多
手持菜刀,她持情操
4楼-- · 2019-02-07 13:55

It sounds like you are trying to access the cluster vector that is returned by kmeans(). From the help page for cluster:

A vector of integers (from 1:k) indicating the cluster to which each 
point is allocated.

Using the example on the help page:

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))

#Access the cluster vector
cl$cluster

> cl$cluster
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [45] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [89] 1 1 1 1 1 1 1 1 1 1 1 1

To address the question in the comments

You can "map" the cluster number to the original data by doing something like this:

out <- cbind(x, clusterNum = cl$cluster)
head(out)

               x          y clusterNum
[1,] -0.42480483 -0.2168085          2
[2,] -0.06272004  0.3641157          2
[3,]  0.08207316  0.2215622          2
[4,] -0.19539844  0.1306106          2
[5,] -0.26429056 -0.3249288          2
[6,]  0.09096253 -0.2158603          2

cbind is the function for column bind, there is also an rbind function for rows. See their help pages for more details ?cbind and ?rbind respectively.

查看更多
登录 后发表回答