Conditional sum with output for all rows in r data

2019-08-12 15:22发布

问题:

I have a coding issue what I think should be very easy. I have created a simplified dataset:

DT <- data.table(Bank=rep(c("a","b","c"),4),
                 Type=rep(c("Ass","Liab"),6),
                 Amount=c(100,200,300,400,200,300,400,500,200,100,300,100))
# Bank Type Amount SumLiab
# 1:    a  Ass    100      NA
# 2:    b Liab    200     700
# 3:    c  Ass    300      NA
# 4:    a Liab    400     500
# 5:    b  Ass    200      NA
# 6:    c Liab    300     400
# 7:    a  Ass    400      NA
# 8:    b Liab    500     700
# 9:    c  Ass    200      NA
# 10:    a Liab    100     500
# 11:    b  Ass    300      NA
# 12:    c Liab    100     400

I want to create a variable that is the sum of amount when Type = "Liab" per bank. So this is no problem:

DT[Type=='Liab',SumLiab:=sum(Amount),by=Bank]
# Bank Type Amount SumLiab
# 1:    a  Ass    100      NA
# 2:    b Liab    200     700
# 3:    c  Ass    300      NA
# 4:    a Liab    400     500
# 5:    b  Ass    200      NA
# 6:    c Liab    300     400
# 7:    a  Ass    400      NA
# 8:    b Liab    500     700
# 9:    c  Ass    200      NA
# 10:    a Liab    100     500
# 11:    b  Ass    300      NA
# 12:    c Liab    100     400

But I want this value for all rows, even when Type =='Ass'. I understand that I now get NA due to the DT[Type=='Liab',..] restriction. Is there a clever way of coding to get the value SumLiab for all rows? (So row1 that currently is NA for SumLiab gets the value 500)

Thanks! Tim

回答1:

When we are using Type=='Liab' in 'i', it is inserting the values only to that rows indexed by 'i'. We can subset the 'Amount' based on Type=='Liab' in 'j' and assign (:=) it to be new variable.

 DT[, SumLiab:= sum(Amount[Type=='Liab']), by =Bank]
 DT
 #   Bank Type Amount SumLiab
 #1:    a  Ass    100     500
 #2:    b Liab    200     700
 #3:    c  Ass    300     400
 #4:    a Liab    400     500
 #5:    b  Ass    200     700
 #6:    c Liab    300     400
 #7:    a  Ass    400     500
 #8:    b Liab    500     700
 #9:    c  Ass    200     400
 #10:   a Liab    100     500
 #11:   b  Ass    300     700
 #12:   c Liab    100     400


回答2:

No, I don't think that is correct.

You can try this:

DT[ SumLiab:=sum(Amount), by = list(Bank, Type)][]

Output of the code: