This question already has an answer here:
I have a data frame as follow:
+-----+-------+
| V1 | V2 |
+-----+-------+
| 1 | a,b,c |
| 2 | a,c |
| 3 | b,d |
| 4 | e,f |
| . | . |
+-----+-------+
Each of the alphabet is a character separated by comma. I would like to split V2 on each comma and insert the split strings as new rows. For instance, the desired output will be:
+----+----+
| V1 | V2 |
+----+----+
| 1 | a |
| 1 | b |
| 1 | c |
| 2 | a |
| 2 | c |
| 3 | b |
| 3 | d |
| 4 | e |
| 4 | f |
+----+----+
I am trying to use strsplit()
to spit V2 first, then cast the list into a data frame. It didn't work. Any help will be appreciated.
Now you can use tidyr 0.5.0's
separate_rows
is in place ofstrsplit
+unnest
.For example:
Gives:
See reference: https://blog.rstudio.org/2016/06/13/tidyr-0-5-0/
As of Dec 2014, this can be done using the unnest function from Hadley Wickham's tidyr package (see release notes http://blog.rstudio.org/2014/12/08/tidyr-0-2-0/)
Update 2017: note the
separate_rows
function as described by @Tif below.It works so much better, and it allows to "unnest" multiple columns in a single statement:
You can consider
cSplit
withdirection = "long"
from my "splitstackshape" package.Usage would be:
Old answer....
Here is one approach using base R. It assumes we're starting with a
data.frame
named "mydf". It usesread.csv
to read in the second column as a separatedata.frame
, which we combine with the first column from your source data. Finally, you usereshape
to convert the data into a long form.Another fairly direct alternative is:
Here's a
data.table
solution:Another
data.table
solution, which doesn't rely on the existence of any unique fields in the original data.The important thing is
by=seq_len(nrow(DT))
, this is the 'fake' uniqueID that the splitting occurs on. It's tempting to useby=.I
instead, as it should be defined the same, but.I
appears to be a magical thing that changes its value, better to stick withby=seq_len(nrow(DT))
There are three columns in the output. We simply name the two existing columns, and then compute the third as a split
Here is another way of doing it..