Referring to the question answered by @holzben Clustering: how to extract most distinguishing features?
Using the SK-Means package, I managed to get the cluster. I couldn't figure out why the word frequency in all clusters is so small. It didn't make sense to me as I have about 10,000 tweets in my dataset. What am I doing wrong?
My dataset is available at
> class(myCorpus)
[1] "VCorpus" "Corpus" "list"
srilanka warrior airtickets avionics ayf citizens
4 4 3 3 3 3
higher jumpa ec bodoh komentari batch
12 11 9 8 8 7
liong ryanair yi airlinescrew aksi berjaya
5 4 4 3 3 3
and below is the script I used to get the above clusters:
