How to merge multiple data.frames and sum and aver

2019-07-12 13:46发布

I have over 20 twenty data.frames with the same columns but differing amount of rows. My goal is to merge the data.frames by the column "Name" (which is a list of five names) and while merging I would like the rows with the same name to sum column A, sum column B, and get the mean of column C.

Here is what I am currently doing.

First I will just merge 2 data.frames at a time.

DF <- merge(x=abc, y=def, by = "Name", all = T)

Merged DF will look like such

Name                              A.x    B.x   C.x   A.y   B.y  C.y
name1,name2,name3,name4,name5      11     24     7    NA    NA   NA
name1,name3,name4,name6,name7       4      8    12     3     4    7
name1,name2,name5,name6,name7      12      4     5    NA    NA   NA
name3,name4,name5,name6,name7      NA     NA    NA    15     3   28

I will then add these ifelse statements to deal with the NAs and non unique rows. For the non unique rows it will add for A add for B and for C it will get an average.

DF$A <- ifelse(is.na(DF$A.x), DF$A.y,
    ifelse(is.na(DF$A.y), DF$A.x,
        ifelse((!is.na(DF$A.x)) & (!is.na(DF$A.y)), DF$A.x + DF$A.y, 1)))
DF$B <- ifelse(is.na(DF$B.x), DF$B.y, 
    ifelse(is.na(DF$B.y), DF$B.x,
        ifelse((!is.na(DF$B.x)) & (!is.na(DF$B.y)), DF$B.x + DF$B.y, 1)))
DF$C <- ifelse(is.na(DF$C.x), DF$C.y, 
    ifelse(is.na(DF$C.y), DF$C.x,
        ifelse((!is.na(DF$C.x)) & (!is.na(DF$C.y)), (DF$C.x  + DF$C.y)/2, 1)))

DF will now look like such

Name                            A.x  B.x  C.x  A.y  B.y  C.y   A   B   C   
name1,name2,name3,name4,name5    11   24    7   NA   NA   NA  11  24   7
name1,name3,name4,name6,name7     4    8   12    3    4    8   7  12  10
name1,name2,name5,name6,name7    12    4    5   NA   NA   NA  12   4   5
name3,name4,name5,name6,name7    NA   NA   NA   15    3   28  15   3  28

I then keep just the Name column and the last three columns

merge1 <- DF[c(1,8,9,10)]

Then I do the same process for the next two data.frames and call it merge2. Then I will merge merge1 and merge 2.

total1 <- merge(x = merge1, y = merge2, by = "Name", all = TRUE)

I will just continue to merge two data frames at a time then merge the Totals data.frames together as well two at a time. I get my end result that I want but it is a timely process and not very efficient.

Another way I think I could do it is may be do a rbind with all the data.frames then if in the Name column any row has the same list of names as another row then make that one row, add column A, add column B and get the mean of column C. But I am not sure how to do that as well.

Here is an example of what I would like with rind

Name                              A     B     C
name1,name2,name3,name4,name5    11    24     7
name1,name3,name4,name6,name7     4     8    12
name1,name2,name5,name6,name7    12     4     5 
name3,name4,name5,name6,name7    15     3    28 
name1,name3,name4,name6,name7     3     4     8

The end result would look like such

Name                              A     B     C
name1,name2,name3,name4,name5    11    24     7
name1,name3,name4,name6,name7     7    12    10
name1,name2,name5,name6,name7    12     4     5 
name3,name4,name5,name6,name7    15     3    28 

Again, I am sure there are more efficient ways to complete what I want than what I am currently doing so any help would be greatly appreciated.

2条回答
爷的心禁止访问
2楼-- · 2019-07-12 14:24

We can use dplyr

library(dplyr)
bind_rows(abc, def, ...) %>%
   group_by(name) %>%
   summarise(A= sum(A, na.rm= TRUE),
            B = sum(B, na.rm= TRUE),
            C = mean(C, na.rm=TRUE))
查看更多
啃猪蹄的小仙女
3楼-- · 2019-07-12 14:29

I think your second approach is the way to go, and you can do that with data.table or dplyr.

Here a few steps using data.table. First, if your data frames are abc, def, ... do:

DF <- do.call(rbind, list(abc,def,...))

now you can transform them into a data.table

DT <- data.table(DF)

and simply do something like

DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]

double check the data.table vignettes to get a better idea how that package work.

查看更多
登录 后发表回答