I have a vector of words and a a vector of comments:
word.list <- c("very", "experience", "glad")
comments <- c("very good experience. first time I have been and I would definitely come back.",
"glad I scheduled an appointment.",
"the staff have become more cordial.",
"the experience i had was not good at all.",
"i am very glad")
I would like to create a data frame that looks like
df <- data.frame(comments = c("very good experience. first time I have been and I would definitely come back.",
"glad I scheduled an appointment.",
"the staff have become more cordial.",
"the experience i had was not good at all.",
"i am very glad"),
very = c(1,0,0,0,1),
glad = c(0,1,0,0,1),
experience = c(1,0,0,1,0))
I have 12,000+ comments and 20 words I would like to do this with. How do I go about doing this efficiently? For loops? Any other method?
Using base-R, this code will loop through the list of words and each comment, and check whether each word exists among the split comment (splitting by spaces and punctuation marks), then recombining as a data frame...
Loop through word.list and use grepl:
To have pretty output, convert to a dataframe:
Note: grepl will match "very" with "veryX". If this is not desired then this needs complete word matching.
One way is a combination of
stringi
andgdapTools
package, i.e.You can then use
cbind
ordata.frame
to bind,