This question already has an answer here:
-
Split a column of concatenated comma-delimited data and recode output as factors
2 answers
As shown in the above pic, I've a column, genres, with a list of genres the corresponding movie belongs to. There are in total 19 unique genres. I'd like to know if I can manipulate this data into appending 19 columns to the data set each corresponding to each of the genres identifiers and label the corresponding cells as 0 or 1 indicating the movies affiliation to the each genre columns.
It should look something like below picture.
We can do this after splitting the 'genres' column
library(qdapTools)
d1 <- mtabulate(strsplit(as.character(df1$genres),","))
row.names(d1) <- sub("\\s*\\(.*", "", df1$title)
Or another option is to create a matrix with column names as 'genres' and then do a comparison on the splitted string
m1 <- matrix(0, dimnames = list(sub("\\s*\\(.*", "", df1$title),
c("Adventure", "Animation", "Children",
"Comedy", "Fantasy", "Romance", "Action", "Crime", "Thriller")), ncol=9, nrow = nrow(df1))
m1 + (t(sapply(strsplit(as.character(df1$genres), ","), function(x) colnames(m1) %in% x)))
# Adventure Animation Children Comedy Fantasy Romance Action Crime Thriller
#Toy Story 1 1 1 1 1 0 0 0 0
#Jumanji 1 0 1 0 1 0 0 0 0
#Heat 0 0 0 0 0 0 1 1 1