I clustered data by k-means clustering method, how can i get cluster number correspond to data using k-means clustering techniques in R? In order to get each record belongs to which cluster.
example
12 32 13 => 1. 12,13 2. 32
I clustered data by k-means clustering method, how can i get cluster number correspond to data using k-means clustering techniques in R? In order to get each record belongs to which cluster.
example
12 32 13 => 1. 12,13 2. 32
It sounds like you are trying to access the cluster vector that is returned by kmeans()
. From the help page for cluster:
A vector of integers (from 1:k) indicating the cluster to which each
point is allocated.
Using the example on the help page:
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))
#Access the cluster vector
cl$cluster
> cl$cluster
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[45] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[89] 1 1 1 1 1 1 1 1 1 1 1 1
To address the question in the comments
You can "map" the cluster number to the original data by doing something like this:
out <- cbind(x, clusterNum = cl$cluster)
head(out)
x y clusterNum
[1,] -0.42480483 -0.2168085 2
[2,] -0.06272004 0.3641157 2
[3,] 0.08207316 0.2215622 2
[4,] -0.19539844 0.1306106 2
[5,] -0.26429056 -0.3249288 2
[6,] 0.09096253 -0.2158603 2
cbind
is the function for column bind, there is also an rbind
function for rows. See their help pages for more details ?cbind
and ?rbind
respectively.
@ Java questioner
You can access the cluster data as followed:
> data_clustered <- kmeans(data)
> data_clustered$cluster
data_clustered$cluster
is a vector with the length of the original number of records in data. Each entry is for the that row.
To get all the records belonging to cluster 1:
> data$cluster <- data_clustered$cluster
> data_clus_1 <- data[data$cluster == 1,]
Number of clusters:
> max(data$cluster)
Good luck with your clustering
We like reproducible examples here on Stack Overflow. Otherwise we're just guessing.
I'll guess that you are using kmeans in the stats package.
I'll further guess you haven't read the documentation help(kmeans) which says:
Value:
an object of class 'kmeans' which is a list with components:
cluster: A vector of integers indicating the cluster to which each point is allocated.
There's an example in the help that shows you exactly how that works.