I have a dataframe which contains some duplicates. I want to sum rows of two columns where there is a duplicate and then delete the unwanted row.
Here is an example of the data,
Year ID Lats Longs N n c_id
2015 200 30.5417 -20.5254 150 30 4142
2015 200 30.5417 -20.5254 90 50 4142
I want to sum columns N and n into one row. the rest of the information i.e. Lats , Longs , ID and Year is to remain the same e.g.,
Year ID Lats Long N n c_id
2015 200 30.5417 -20.5254 240 80 4142
Solution using data.table
:
require(data.table)
df <- structure(list(year = c(2015, 2015), ID = c(200, 200), Lats = c(30.5417,
30.5417), Longs = c(-20.5254, -20.5254), N = c(150, 90), n = c(30,
50), c_id = c(4142, 4142)), .Names = c("year", "ID", "Lats",
"Longs", "N", "n", "c_id"), row.names = c(NA, -2L),
class = "data.frame")
dt <- data.table(df)
dt[, lapply(.SD, sum), by="c_id,year,ID,Lats,Longs"]
c_id year ID Lats Longs N n
1: 4142 2015 200 30.5417 -20.5254 240 80
Solution using plyr
:
require(plyr)
ddply(df, .(c_id, year, ID, Lats, Longs), function(x) c(N=sum(x$N), n=sum(x$n)))
c_id year ID Lats Longs N n
1 4142 2015 200 30.5417 -20.5254 240 80