I'm new to R / having the option to easily re-organize data, and have hunted around for a solution but can't find exactly what I'd like to do. Reshape2's melt/cast doesn't quite seem to work and I haven't mastered plyr well enough to factor it in here.
Basically I have a data.frame with a structure outlined below, with a category column in which each element is a variable-length list of categories (more compact because the # columns is much larger, and I actually have multiple category_lists that I'd like to keep separate):
>mydf
ID category_list xval yval
1 ID1 cat1, cat2, cat3 xnum1 ynum1
2 ID2 cat2, cat3 xnum2 ynum2
3 ID3 cat1 xnum3 ynum3
I want to do manipulations with the categories as factors (and the values associated, i.e. columns 3/4), so I think I need something like this in the end, where IDs and x/y/other column values are duplicated according to the length of the category list:
ID category xval yval
1 ID1 cat1 xnum1 ynum1
2 ID1 cat2 xnum1 ynum1
3 ID1 cat3 xnum1 ynum1
4 ID2 cat2 xnum2 ynum2
5 ID2 cat3 xnum2 ynum2
6 ID3 cat3 xnum2 ynum2
If there's another solution to factor/facet on the category_list, that would be a simpler solution but I haven't come across methods that support this, e.g. the following throws an error
>ggplot(mydf, aes(x=x, y=y)) + geom_point() + facet_grid(~cat_list)
Error in layout_base(data, cols, drop = drop) : At least one layer must contain all variables used for facetting
Thanks!
A possibility:
The answer will depend on the format of
category_list
. If in fact it is alist
for each rowSomething like
or
Then you can use
plyr
andmerge
to create your long form dataor a non-plyr approach that doesn't require
merge
A plodding but seemingly robust solution:
This will be a non-plyr approach:
Note: Original answer deleted as my answer was based on a different data structure than what the OP seems to actually have.
Scenario 1: Column is a
list
Using @mnel's sample data:
Using
listCol_l
from my "splitstackshape" packageUsing
unnest
from the "tidyr" packageScenario 2: Column is a concatenated string
Using @BenBolker's sample data:
Using
cSplit
from my "splitstackshape" packageAnother base R possibility using
by
:Result: