How can I get cluster number correspond to data us

2019-02-07 13:19发布

问题:

I clustered data by k-means clustering method, how can i get cluster number correspond to data using k-means clustering techniques in R? In order to get each record belongs to which cluster.

example 12 32 13 => 1. 12,13 2. 32

回答1:

It sounds like you are trying to access the cluster vector that is returned by kmeans(). From the help page for cluster:

A vector of integers (from 1:k) indicating the cluster to which each 
point is allocated.

Using the example on the help page:

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))

#Access the cluster vector
cl$cluster

> cl$cluster
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [45] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [89] 1 1 1 1 1 1 1 1 1 1 1 1

To address the question in the comments

You can "map" the cluster number to the original data by doing something like this:

out <- cbind(x, clusterNum = cl$cluster)
head(out)

               x          y clusterNum
[1,] -0.42480483 -0.2168085          2
[2,] -0.06272004  0.3641157          2
[3,]  0.08207316  0.2215622          2
[4,] -0.19539844  0.1306106          2
[5,] -0.26429056 -0.3249288          2
[6,]  0.09096253 -0.2158603          2

cbind is the function for column bind, there is also an rbind function for rows. See their help pages for more details ?cbind and ?rbind respectively.



回答2:

@ Java questioner

You can access the cluster data as followed:

> data_clustered <- kmeans(data)
> data_clustered$cluster 

data_clustered$cluster is a vector with the length of the original number of records in data. Each entry is for the that row.

To get all the records belonging to cluster 1:

> data$cluster <- data_clustered$cluster 
> data_clus_1 <- data[data$cluster == 1,]

Number of clusters:

> max(data$cluster)

Good luck with your clustering



回答3:

We like reproducible examples here on Stack Overflow. Otherwise we're just guessing.

I'll guess that you are using kmeans in the stats package.

I'll further guess you haven't read the documentation help(kmeans) which says:

Value:

  an object of class 'kmeans' which is a list with components:

   cluster: A vector of integers indicating the cluster to which each point is allocated.

There's an example in the help that shows you exactly how that works.