Copy column data when function unaggregates a sing

2019-09-20 16:44发布

问题:

I need help in taking an annual total (for each of many initiatives) and breaking that down to each month using a simple division formula. I need to do this for each distinct combination of a few columns while copying down the columns that are broken from annual to each monthly total. The loop will apply the formula to two columns and loop through each distinct group in a vector. I tried to explain in an example below as it's somewhat complex.

What I have :

| Init | Name | Date |Total Savings|Total Costs| 
|  A   | John | 2015 |    TotalD   |   TotalD  |
|  A   | Mike | 2015 |    TotalE   |   TotalE  |
|  A   |  Rob | 2015 |    TotalF   |   TotalF  |
|  B   | John | 2015 |    TotalG   |   TotalG  |
|  B   | Mike | 2015 |    TotalH   |   TotalH  |
......
| Init | Name | Date |Total Savings|Total Costs| 
|  A   | John | 2016 |    TotalI   |   TotalI  |
|  A   | Mike | 2016 |    TotalJ   |   TotalJ  |
|  A   |  Rob | 2016 |    TotalK   |   TotalK  |
|  B   | John | 2016 |    TotalL   |   TotalL  |
|  B   | Mike | 2016 |    TotalM   |   TotalM  |

I'm going to loop a function for the first row to take the "Total Savings" and "Total Costs" and divide by 12 where Date = 2015 and 9 where Date = 2016 (YTD to Sept) and create an individual row for each. I'm essentially breaking out an annual total in a row and creating a row for each month of the year. I need help in running that loop to copy also columns "Init", "Name", until "Init", "Name" combination are not distinct. Also, note the formula for the division based on the year will be different as well. I suppose I could separate the datasets for 2015 and 2016 and use two different functions and merge if that would be easier. Below should be the output:

| Init | Name | Date       |Monthly Savings|Monthly Costs| 
|  A   | John | 01-01-2015 |   TotalD/12*  |   MonthD    |
|  A   | John | 02-01-2015 |    MonthD     |   MonthD    |
|  A   | John | 03-01-2015 |    MonthD     |   MonthD    |
...
|  A   | Mike | 01-01-2016 |    TotalE/9*  |   TotalE    |
|  A   | Mike | 02-01-2016 |    TotalE     |   TotalE    |
|  A   | Mike | 03-01-2016 |    TotalE     |   TotalE    |
...
|  B   | John | 01-01-2015 |   TotalG/12*  |   MonthD    |
|  B   | John | 02-01-2015 |    MonthG     |   MonthD    |
|  B   | John | 03-01-2015 |    MonthG     |   MonthD    |

TotalD/12* = MonthD - this is the formula for 2015
TotalE/9* = MonthE - this is the formula for 2016

Any help would be appreciated...

回答1:

As a start, here are some reproducible data, with the columns described:

myData <-
  data.frame(
    Init = rep(LETTERS[1:3], each = 4)
    , Name = rep(c("John", "Mike"), each = 2)
    , Date = 2015:2016
    , Savings = (1:12)*1200
    , Cost = (1:12)*2400
  )

Next, set the divisor to be used for each year:

toDivide <-
  c("2015" = 12, "2016" = 9)

Then, I am using the magrittr pipe as I split the data up into single rows, then looping through them with lapply to expand each row into the appropriate number of rows (9 or 12) with the savings and costs divided by the number of months. Finally, dplyr's bind_rows stitches the rows back together.

myData %>%
  split(1:nrow(.)) %>%
  lapply(function(x){
    temp <- data.frame(
      Init = x$Init
      , Name = x$Name
      , Date = as.Date(paste(x$Date
                           , formatC(1:toDivide[as.character(x$Date)]
                                     , width = 2, flag = "0")
                           , "01"
                           , sep = "-"))
      , Savings = x$Savings / toDivide[as.character(x$Date)]
      , Cost = x$Cost / toDivide[as.character(x$Date)]
    )
  }) %>%
  bind_rows()

The head of this looks like:

    Init Name       Date  Savings      Cost
1      A John 2015-01-01 100.0000  200.0000
2      A John 2015-02-01 100.0000  200.0000
3      A John 2015-03-01 100.0000  200.0000
4      A John 2015-04-01 100.0000  200.0000
5      A John 2015-05-01 100.0000  200.0000
6      A John 2015-06-01 100.0000  200.0000

with similar entries for each expanded row.