I have mydf
data frame below. I want to split any cell that contains comma separated data and put it into rows. I am looking for a data frame similar to y
below. How could i do it efficiently in few steps? Currently i am using cSplit
function on one column at a time.
I tried cSplit(mydf, c("name","new"), ",", direction = "long")
, but that didn`t work
library(splitstackshape)
mydf=data.frame(name = c("AB,BW","x,y,z"), AB = c('A','B'), new=c("1,2,3","4,5,6,7"))
mydf
x=cSplit(mydf, c("name"), ",", direction = "long")
x
y=cSplit(x, c("new"), ",", direction = "long")
y
There are times when a for
loop is totally fine to work with in R. This is one of those times. Try:
library(splitstackshape)
cols <- c("name", "new")
for (i in cols) {
mydf <- cSplit(mydf, i, ",", "long")
}
mydf
## name AB new
## 1: AB A 1
## 2: AB A 2
## 3: AB A 3
## 4: BW A 1
## 5: BW A 2
## 6: BW A 3
## 7: x B 4
## 8: x B 5
## 9: x B 6
## 10: x B 7
## 11: y B 4
## 12: y B 5
## 13: y B 6
## 14: y B 7
## 15: z B 4
## 16: z B 5
## 17: z B 6
## 18: z B 7
Here's a small test using slightly bigger data:
# concat.test = sample data from "splitstackshape"
test <- do.call(rbind, replicate(5000, concat.test, FALSE))
fun1 <- function() {
cols <- c("Likes", "Siblings")
for (i in cols) {
test <- cSplit(test, i, ",", "long")
}
test
}
fun2 <- function() {
test %>%
separate_rows("Likes") %>%
separate_rows("Siblings")
}
system.time(fun1())
# user system elapsed
# 3.205 0.056 3.261
system.time(fun2())
# user system elapsed
# 11.598 0.066 11.662
We can use the separate_rows
function from the tidyr package.
library(tidyr)
mydf2 <- mydf %>%
separate_rows("name") %>%
separate_rows("new")
mydf2
# AB name new
# 1 A AB 1
# 2 A AB 2
# 3 A AB 3
# 4 A BW 1
# 5 A BW 2
# 6 A BW 3
# 7 B x 4
# 8 B x 5
# 9 B x 6
# 10 B x 7
# 11 B y 4
# 12 B y 5
# 13 B y 6
# 14 B y 7
# 15 B z 4
# 16 B z 5
# 17 B z 6
# 18 B z 7
If you don't what to use separate_rows
function more than once, we can further design a function to iteratively apply the separate_rows
function.
expand_fun <- function(df, vars){
while (length(vars) > 0){
df <- df %>% separate_rows(vars[1])
vars <- vars[-1]
}
return(df)
}
The expand_fun
takes two arguments. The first argument, df
, is the original data frame. The second argument, vars
, is a character string with the columns names we want to expand. Here is an example using the function.
mydf3 <- expand_fun(mydf, vars = c("name", "new"))
mydf3
# AB name new
# 1 A AB 1
# 2 A AB 2
# 3 A AB 3
# 4 A BW 1
# 5 A BW 2
# 6 A BW 3
# 7 B x 4
# 8 B x 5
# 9 B x 6
# 10 B x 7
# 11 B y 4
# 12 B y 5
# 13 B y 6
# 14 B y 7
# 15 B z 4
# 16 B z 5
# 17 B z 6
# 18 B z 7