R separate comma separated cells into rows and Car

2019-03-02 07:10发布

问题:

I have mydf data frame below. I want to split any cell that contains comma separated data and put it into rows. I am looking for a data frame similar to y below. How could i do it efficiently in few steps? Currently i am using cSplit function on one column at a time.

I tried cSplit(mydf, c("name","new"), ",", direction = "long"), but that didn`t work

library(splitstackshape)
mydf=data.frame(name = c("AB,BW","x,y,z"), AB = c('A','B'), new=c("1,2,3","4,5,6,7"))
mydf

x=cSplit(mydf, c("name"), ",", direction = "long")
x
y=cSplit(x, c("new"), ",", direction = "long")
y

回答1:

There are times when a for loop is totally fine to work with in R. This is one of those times. Try:

library(splitstackshape)
cols <- c("name", "new")
for (i in cols) {
  mydf <- cSplit(mydf, i, ",", "long")
}

mydf
##     name AB new
##  1:   AB  A   1
##  2:   AB  A   2
##  3:   AB  A   3
##  4:   BW  A   1
##  5:   BW  A   2
##  6:   BW  A   3
##  7:    x  B   4
##  8:    x  B   5
##  9:    x  B   6
## 10:    x  B   7
## 11:    y  B   4
## 12:    y  B   5
## 13:    y  B   6
## 14:    y  B   7
## 15:    z  B   4
## 16:    z  B   5
## 17:    z  B   6
## 18:    z  B   7

Here's a small test using slightly bigger data:

# concat.test = sample data from "splitstackshape"
test <- do.call(rbind, replicate(5000, concat.test, FALSE))

fun1 <- function() {
  cols <- c("Likes", "Siblings")
  for (i in cols) {
    test <- cSplit(test, i, ",", "long")
  }
  test
}

fun2 <- function() {
  test %>%
    separate_rows("Likes") %>%
    separate_rows("Siblings")
}

system.time(fun1())
#   user  system elapsed 
#  3.205   0.056   3.261 
system.time(fun2())
#   user  system elapsed 
# 11.598   0.066  11.662


回答2:

We can use the separate_rows function from the tidyr package.

library(tidyr)

mydf2 <- mydf %>%
  separate_rows("name") %>%
  separate_rows("new")
mydf2

#    AB name new
# 1   A   AB   1
# 2   A   AB   2
# 3   A   AB   3
# 4   A   BW   1
# 5   A   BW   2
# 6   A   BW   3
# 7   B    x   4
# 8   B    x   5
# 9   B    x   6
# 10  B    x   7
# 11  B    y   4
# 12  B    y   5
# 13  B    y   6
# 14  B    y   7
# 15  B    z   4
# 16  B    z   5
# 17  B    z   6
# 18  B    z   7 

If you don't what to use separate_rows function more than once, we can further design a function to iteratively apply the separate_rows function.

expand_fun <- function(df, vars){
  while (length(vars) > 0){
    df <- df %>% separate_rows(vars[1])
    vars <- vars[-1]
  }
  return(df)
}

The expand_fun takes two arguments. The first argument, df, is the original data frame. The second argument, vars, is a character string with the columns names we want to expand. Here is an example using the function.

mydf3 <- expand_fun(mydf, vars = c("name", "new"))
mydf3
#    AB name new
# 1   A   AB   1
# 2   A   AB   2
# 3   A   AB   3
# 4   A   BW   1
# 5   A   BW   2
# 6   A   BW   3
# 7   B    x   4
# 8   B    x   5
# 9   B    x   6
# 10  B    x   7
# 11  B    y   4
# 12  B    y   5
# 13  B    y   6
# 14  B    y   7
# 15  B    z   4
# 16  B    z   5
# 17  B    z   6
# 18  B    z   7