Separate Comma Delimited Cells To New Rows

2020-06-06 01:35发布

Hi I have a table with comma delimited columns and I need to convert the comma delimited values to new rows. for exmaple the given table is

Name     Start      End 
A        1,2,3    4,5,6
B          1,2      4,5
C      1,2,3,4  6,7,8,9   

I need to convert it like

Name Start End
   A     1   4
   A     2   5  
   A     3   6
   B     1   4
   B     2   5 
   C     1   6
   C     2   7
   C     3   8   
   C     4   9

I can do that using VB script but I need to solve it using R Can anyone solve this?

标签: r
4条回答
乱世女痞
2楼-- · 2020-06-06 02:26

Here's another, just for fun. Take d as the original data.

f <- function(x, ul = TRUE)
{
    x <- deparse(substitute(x))
    if(ul) unlist(strsplit(d[[x]], ','))
    else strsplit(d[[x]], ',')
}

> data.frame(Name = rep(d$Name, sapply(f(End, F), length)),
             Start = f(Start), End = f(End))
#   Name Start End
# 1    A     1   4
# 2    A     2   5
# 3    A     3   6
# 4    B     1   4
# 5    B     2   5
# 6    C     1   6
# 7    C     2   7
# 8    C     3   8
# 9    C     4   9
查看更多
时光不老,我们不散
3楼-- · 2020-06-06 02:30

The separate_rows() function in tidyr is the boss for observations with multiple delimited values...

# create data 
library(tidyverse)
d <- data_frame(
  Name = c("A", "B", "C"), 
  Start = c("1,2,3", "1,2", "1,2,3,4"), 
  End = c("4,5,6", "4,5", "6,7,8,9")
)
d
# # A tibble: 3 x 3
#    Name   Start     End
#   <chr>   <chr>   <chr>
# 1     A   1,2,3   4,5,6
# 2     B     1,2     4,5
# 3     C 1,2,3,4 6,7,8,9

# tidy data
separate_rows(d, Start, End)
# # A tibble: 9 x 3
#    Name Start   End
#   <chr> <chr> <chr>
# 1     A     1     4
# 2     A     2     5
# 3     A     3     6
# 4     B     1     4
# 5     B     2     5
# 6     C     1     6
# 7     C     2     7
# 8     C     3     8
# 9     C     4     9

# use convert set to TRUE for integer column modes
separate_rows(d, Start, End, convert = TRUE)
# # A tibble: 9 x 3
#    Name Start   End
#   <chr> <int> <int>
# 1     A     1     4
# 2     A     2     5
# 3     A     3     6
# 4     B     1     4
# 5     B     2     5
# 6     C     1     6
# 7     C     2     7
# 8     C     3     8
# 9     C     4     9
查看更多
我想做一个坏孩纸
4楼-- · 2020-06-06 02:32

You might have asked this question on SO as there is no issue dealing with statistics :)

Anyway, I made up a quite complicated and ugly solution which might work for you:

# load your data
x <- structure(list(Name = c("A", "B", "C"), Start = c("1,2,3", "1,2", 
"1,2,3,4"), End = c("4,5,6", "4,5", "6,7,8,9")), .Names = c("Name", 
"Start", "End"), row.names = c(NA, -3L), class = "data.frame")

Which looks like in R like:

> x
  Name   Start     End length
1    A   1,2,3   4,5,6      3
2    B     1,2     4,5      2
3    C 1,2,3,4 6,7,8,9      4

Data transformation with the help of strsplit calls:

data <- data.frame(cbind(
    rep(x$Name,as.numeric(lapply(strsplit(x$Start,","), length))),
    unlist(lapply(strsplit(x$Start,","), cbind)),
    unlist(lapply(strsplit(x$End,","), cbind))
    ))

Naming the new data frame:

names(data) <- c("Name", "Start", "End")

Which looks like:

> data
  Name Start End
1    A     1   4
2    A     2   5
3    A     3   6
4    B     1   4
5    B     2   5
6    C     1   6
7    C     2   7
8    C     3   8
9    C     4   9
查看更多
我命由我不由天
5楼-- · 2020-06-06 02:33

Here's an approach that should work for you. I'm assuming that your three input vectors are in different objects. We are going to create a list of those inputs and write a function that process each object and returns them in the form of a data.frame with plyr.

The things to take note of here are the splitting of the character vector into it's component parts, then using as.numeric to convert the numbers from the character form when they were split. Since R fills matrices by column, we define a 2 column matrix and let R fill the values for us. We then retrieve the Name column and put it all together in a data.frame. plyr is nice enough to process the list and convert it into a data.frame for us automatically.

library(plyr)

a <- paste("A",1, 2,3,4,5,6, sep = ",", collapse = "")
b <- paste("B",1, 2,4,5, sep = ",", collapse = "")
c <- paste("C",1, 2,3,4,6,7,8,9, sep = ",", collapse = "")

input <- list(a,b,c)

splitter <- function(x) {
    x <- unlist(strsplit(x, ","))
    out <- data.frame(x[1], matrix(as.numeric(x[-1]), ncol = 2))
    colnames(out) <- c("Name", "Start", "End")
    return(out)
}


ldply(input, splitter)

And the output:

> ldply(input, splitter)
 Name Start End
1    A     1   4
2    A     2   5
3    A     3   6
4    B     1   4
5    B     2   5
6    C     1   6
7    C     2   7
8    C     3   8
9    C     4   9
查看更多
登录 后发表回答