Aggregate sum obs with different ID's in the s

2020-04-30 17:43发布

问题:

My goal is to make another column by summing the observation from the present day and all previous observations from the same ID by using the date (the data set is sorted in date and chr nr(ID). I will need the aggregation to start over when a new "id" is presented.

there might be som NA's, they should be considered as null

"Doseringer_pr_kg_dyr" is the observation.

CHR_NR    DATO_AFSLUT    Doseringer_pr_kg_dyr    brugstid 
10358    2018-08-06    29416.67    31   
10358    2018-09-06    104682.27    36 
10358    2018-10-12    10333.33    26   
10358    2018-11-07    10090.91    27   
10358    2018-12-04    8000.00    NA   
13168    2012-01-23    12042.25    2   
13168    2012-01-25    9000.00    42 
13168    2012-03-07    44450.70    19
13168    2012-03-26    35000.00    37 
13168    2012-05-02    93478.26    70

I expect something ala:

CHR_NR    DATO_AFSLUT    Doseringer_pr_kg_dyr    brugstid     sum
10358  2018-11-07    10090.91    27    [108,6]+[109,3]  
10358  2018-12-04    8000.00    NA    [109,6]+[110,3]
13168  2012-01-23    12042.25    2    [111,3]
13168  2012-01-25    9000.00    42    [111,6]+[112,3]
13168  2012-03-07    44450.70    19    [112,6]+[113,3]

where [row, column] + [row, column] is summed in the new column.

i thought of one of the apply functions or a loop like:

agg<-function(dat) {
  last_row <- 0
  for ( row in dat ) {
    if ( row[1] == last_row[1] ) {
      row[6] <- last_row[6] + row[3]
    } else { 
      row[6] <- row[3]
    }
    last_row <- row
  }
}

From the comment:

# dput(head(a)) 
a <- structure(list(CHR_NR = c(10358, 10358, 10358, 10358, 10358, 10358), 
  DATO_AFSLUT = structure(c(15349, 15387, 15426, 15441, 15455, 15476), 
  Level = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real ), 
  Doseringer_pr_kg_dyr = c(276152.688936271, 161616.695196052, 127549.514333096, 13333.3333333333, 86255.3897180763, 31034.1151385928 ), 
  brugstid = c(38, 39, 15, 14, 21, 15), 
  i = c(7267.17602463871, 4144.01782553979, 8503.30095553976, 952.380952380952, 4107.39951038459, 2068.94100923952)), 
  row.names = 6:11, class = "data.frame")

回答1:

a$Doseringer_pr_kg_dyr[is.na(a$Doseringer_pr_kg_dyr)]<-0 a$x<-ave(a$Doseringer_pr_kg_dyr,a$CHR_NR,FUN = cumsum)

where x is the cummulated and "ave" groups the accumulation in CHR



标签: r apply lapply