I made a sample data frame. I try to make a wordcloud from the Projects column.
Hours<-c(2,3,4,2,1,1,3)
Project<-c("a","b","b","a","c","c","c")
Period<-c("2014-11-22","2014-11-23","2014-11-24","2014-11-22", "2014-11-23", "2014-11-23", "2014-11-24")
cd=data.frame(Project,Hours,Period)
Here is my code:
cd$Project<-as.character(cd$Project)
wordcloud(cd$Project,min.freq=1)
but I get the following error:
Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
In addition: Warning messages:
1: In max(freq) : no non-missing arguments to max; returning -Inf
2: In max(freq) : no non-missing arguments to max; returning -Inf
What am I doing wrong?
I think you are missing the freq
argument. You want to create a column indicating how often each project happened. I, therefore, transformed your data using count
in the dplyr
package.
library(dplyr)
library(wordcloud)
cd <- data.frame(Hours = c(2,3,4,2,1,1,3),
Project = c("a","b","b","a","c","c","c"),
Period = c("2014-11-22","2014-11-23","2014-11-24",
"2014-11-22", "2014-11-23", "2014-11-23",
"2014-11-24"),
stringsAsFactors = FALSE)
cd2 <- count(cd, Project)
# Project n
#1 a 2
#2 b 2
#3 c 3
wordcloud(words = cd2$Project, freq = cd2$n, min.freq = 1)
![](https://www.manongdao.com/static/images/pcload.jpg)
If you specify a character column, then the function creates a corpus and a document term matrix for you behind the scenes. The problem is that the default behavior for the TermDocumentMatrix function from the tm
pacakge is that is only tracks words that are longer than three characters (also, it removes "stop words" so values like "a" would be removed). So if you changed your sample to
Project<-c("aaa","bbb","bbb","aaa","ccc","ccc","ccc")
it would work just fine. It appears there are no ways to change the control options sent to TermDocumentMatrix. If you want to calculate the frequencies yourself in the same way that the default wordcloud function does, you can do
corpus <- Corpus(VectorSource(cd$Project))
corpus <- tm_map(corpus, removePunctuation)
# corpus <- tm_map(corpus, function(x) removeWords(x, stopwords()))
tdm <-TermDocumentMatrix(corpus, control=list(wordLengths=c(1,Inf)))
freq <- slam::row_sums(tdm)
words <- names(freq)
wordcloud(words, freq, min.freq=1)
however, for simple cases, you could just count the frequencies with table()
tbl <- table(cd$Project)
wordcloud(names(tbl), tbl, min.freq=1)