Is it possible to get same kmeans clusters for every execution for a particular data set. Just like for a random value we can use a fixed seed. Is it possible to stop randomness for clustering?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Yes. Use set.seed
to set a seed for the random value before doing the clustering.
Using the example in kmeans
:
set.seed(1)
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
set.seed(2)
XX <- kmeans(x, 2)
set.seed(2)
YY <- kmeans(x, 2)
Test for equality:
identical(XX, YY)
[1] TRUE
回答2:
Yes, calling set.seed(foo)
immediately prior to running kmeans(....)
will give the same random start and hence the same clustering each time. foo
is a seed, like 42
or some other numeric value.