I don't know if using dcast()
is the right way, but I want to reshape the following data.frame:
df <- data.frame(x=c("p1","p1","p2"),y=c("a","b","a"),z=c(14,14,16))
df
x y z
1 p1 a 14
2 p1 b 14
3 p2 a 16
so that it looks like this one:
df2 <- data.frame(x=c("p1","p2"),a=c(1,1),b=c(1,0),z=c(14,16))
x a b z
1 p1 1 1 14
2 p2 1 0 16
The variable y
in df
should be broken so that its elements are new variables, each dummy coded. All other variables (in this case just z
) are equal for each person (p1,p2 etc.). The only variable where a specific person p has different values is y
.
The reason I want this is because I need to merge this dataset with other ones by variable x
. Thing is, it needs to be one row per person (p1,p2 etc).
I'm not sure much of this you have to do but if you need a way to automate it, I wrote this little function that might help:
First run dcast:
Load into your R environment:
then run:
and you get:
not as fast as using _apply() but there's no hardcoding, you can enter any colnames you want (maybe you want to skip one in the middle?) and you dont create a new instance of your df. note: I use "=" instead of "<-" because I thought it was being phased out but they can be replaced if need be.
First check for containment in a vector that summarizes the possible values to create columns of logical values. Then 'dummify' by taking as.numeric of those logical values.
The following works, but seems cumbersome.
This is almost a duplicate of a previous question, and the same basic answer I used there works again. No need for any external packages either.
To explain this, as it is a bit odd looking, the
model.matrix
call at its most basic returns a binary indicator variable for each unique value for each row of your data.frame, like so:If you
aggregate
that intermediate result by your two id variables (x
andz
), you are then essentially acting on the initial data.frame of:So if you take the
max
value ofya
andyb
within each combination ofx
andz
, you basically do:...and repeat that for each unique
x
/z
combination to give the final result:Things get a bit crazy to generalise this to more columns, but it can be done, courtesy of this question e.g.: