Break summed row into individual rows in R

2019-08-02 12:21发布

问题:

I have a set of data where I have annual totals for specific values stored in one row (observation). I'd like to tidy the data in R so that this total row is broken out for each month using a simple equation (total/12) that stores the annual total divided by 12 in each of 12 rows as a monthly total. I'm trying to do this in R but am very beginner and not quite sure where to start. Example is below:

Date | Total
2015 | 12,000 

Some R function to convert to:

   Date    | Total
01-01-2015 | 1,000
02-01-2015 | 1,000
03-01-2015 | 1,000

回答1:

Here is an example dataset with multiple years, which I presume is your desired use case:

myData <-
  data.frame(
    Date = 2011:2015
    , Total = (1:5)*1200
  )

Then, I split the data.frame up by the Date column (this assumes the years are unique), looped through with lapply to modify to the monthly values you wanted, then stitched them back together with bind_rows from dplyr (also need to load dplyr or magrittr for the pipes (%>%) to work).

myData %>%
  split(.$Date) %>%
  lapply(function(x){
    temp <- data.frame(
      Date = as.Date(paste(x$Date
                           , formatC(1:12, width = 2, flag = "0")
                           , "01"
                           , sep = "-"))
      , Total = x$Total / 12
    )
  }) %>%
  bind_rows()

Gives (note, using ... to indicate skipped rows)

         Date Total
1  2011-01-01   100
2  2011-02-01   100
3  2011-03-01   100
...
16 2012-04-01   200
17 2012-05-01   200
18 2012-06-01   200
...
29 2013-05-01   300
30 2013-06-01   300
31 2013-07-01   300
...
43 2014-07-01   400
44 2014-08-01   400
45 2014-09-01   400
...
58 2015-10-01   500
59 2015-11-01   500
60 2015-12-01   500

If, as suggested in the comments, you need to divide by a different number of months each year, I would create a vector of the divisors named for the years. Here, I am using dplyr for n_distinct and the magrittr pipe to ease naming of the common case. Then, add/change the entry for the different year:

toDivide <-
  rep(12, n_distinct(myData$Date)) %>%
  setNames(sort(unique(myData$Date)))

toDivide["2016"] <- 9

Then, instead of 12 as above, you can use this vector. So, replace this:

Total = x$Total / 12

within the lapply loop, with this:

Total = x$Total / toDivide[as.character(x$Date)]

and it should divide by the entry for that year, instead of always dividing by the same thing.



标签: mysql r tidyr