Collapsing duplicate rows in R by two variables [d

2019-09-06 15:34发布

问题:

This question already has an answer here:

  • Combining pivoted rows in R by common value 4 answers

I have partially duplicated rows in my data set. These rows match on two variables and then for the rest of the variables, have some NAs. If I can combine these pairs of partially duplicated rows, I would have a complete case for that row.

How can I combine rows in my data set based on similar values for two variables, thereby replacing the NAs in each separate row, leaving one complete row?

a <- (c(1, 1, 1, 1))  
b <- (c(1, 1, 3, 3))  
c <- (c(NA, 0, NA, NA))  
d <- (c(0, NA, 0, NA))  

y <- data.frame(a, b, c, d)
head(y)  

a1 <- (c(1, 1))  
b1 <- (c(1, 3))  
c1 <- (c(0, NA))  
d1 <- (c(0, 0))  

z <- data.frame(a1, b1, c1, d1)
head(z)

回答1:

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(y)), grouped by 'a', 'b', loop throughthe Subset of Data.table (.SD) and get the non-NA elements

library(data.table)
setDT(y)[, lapply(.SD, function(x) x[!is.na(x)]) , .(a,b)]
#   a b  c d
#1: 1 1  0 0
#2: 1 3 NA 0


标签: r dplyr