I'm trying to create multiple dummy variables, based on one column called 'Tags' within my df (14 rows, 2 columns, Score and Tags. My problem is that in each cell there can be any number of chr values (up to about 30 values).
When I ask for:
str(df$Tags)
R returns:
chr [1:14] "\"biologische gerechten\", \"certificaat van uitmuntendheid tripadvisor 2016\", \"gebruik streekproducten\", \"lactose intolera"| __truncated__ ...
And when I ask for:
df$Tags[1]
R returns:
[1] "\"biologische gerechten\", \"certificaat van uitmuntendheid tripadvisor 2016\", \"gebruik streekproducten\", \"lactose intolerantie\", \"met familie\", \"met vrienden\", \"noten allergie\", \"pinda allergie\", \"vegetarische gerechten\", chinees, gastronomisch, glutenvrij, kindvriendelijk, romantisch, traditioneel, trendy, verjaardag, zakelijk"
It seems that the values within the first cell are not formatted the same (the values between comma's)
So what I wish for, is to create a dummy variable for each possible value that occurs within each cells. So the first new dummy should be called "biologische gerechten" (or any alike) and should show for each case whether the corresponding value is present (1) in the column 'Tags' or not (0).
i tried several things with 'dplyr' like:
df = mutate(df, biologisch = ifelse(Tags == "biologische gerechten", 1, 0))
R does create a new column 'biologisch', but it only contains zero's. Is there another way to separate all values and then create dummy variables for all possible values? Hope someone can help me, thank you!
Here's one solution:
Voila: