Say I have a dataset where rows are classes people took:
attendance <- data.frame(id = c(1, 1, 1, 2, 2),
class = c("Math", "English", "Math", "Reading", "Math"))
I.e.,
id class
1 1 "Math"
2 1 "English"
3 1 "Math"
4 2 "Reading"
5 2 "Math"
And I want to create a new dataset where rows are ids and the variables are class names, like this:
class.names <- names(table(attendance$class))
attedance2 <- matrix(nrow=length(table(attendance$id)),
ncol=length(class.names))
colnames(attedance2) <- class.names
attedance2 <- as.data.frame(attedance2)
attedance2$id <- unique(attendance$id)
I.e.,
English Math Reading id
1 NA NA NA 1
2 NA NA NA 2
I want to fill in the NAs with whether that particular id took that class or not. It can be Yes/No, 1/0, or counts of the classes
I.e.,
English Math Reading id
1 "Yes" "Yes" "No" 1
2 "No" "Yes" "Yes" 2
I'm familiar with dplyr, so it'd be easier for me if that was used in the solution but not necessary. Thank you for your help!
We can do this with
base R
Or with
xtabs
NOTE: The binary can be easily converted to 'yes', 'no', but it is better to have either 1/0 or
TRUE/FALSE
Using:
gives:
A similar approach with
data.table
:Or with
dplyr
/tidyr
:Another, somewhat more convoluted option might to reshape first and then replace the counts with
yes
andno
(see here for an explanation about the default aggregate option ofdcast
):which gives:
Now you can replace the count with:
which finally gives: