I have a set of data where I have annual totals for specific values stored in one row (observation). I'd like to tidy the data in R so that this total row is broken out for each month using a simple equation (total/12) that stores the annual total divided by 12 in each of 12 rows as a monthly total. I'm trying to do this in R but am very beginner and not quite sure where to start. Example is below:
Date | Total
2015 | 12,000
Some R function to convert to:
Date | Total
01-01-2015 | 1,000
02-01-2015 | 1,000
03-01-2015 | 1,000
Here is an example dataset with multiple years, which I presume is your desired use case:
myData <-
data.frame(
Date = 2011:2015
, Total = (1:5)*1200
)
Then, I split the data.frame up by the Date
column (this assumes the years are unique), looped through with lapply
to modify to the monthly values you wanted, then stitched them back together with bind_rows
from dplyr
(also need to load dplyr
or magrittr
for the pipes (%>%
) to work).
myData %>%
split(.$Date) %>%
lapply(function(x){
temp <- data.frame(
Date = as.Date(paste(x$Date
, formatC(1:12, width = 2, flag = "0")
, "01"
, sep = "-"))
, Total = x$Total / 12
)
}) %>%
bind_rows()
Gives (note, using ...
to indicate skipped rows)
Date Total
1 2011-01-01 100
2 2011-02-01 100
3 2011-03-01 100
...
16 2012-04-01 200
17 2012-05-01 200
18 2012-06-01 200
...
29 2013-05-01 300
30 2013-06-01 300
31 2013-07-01 300
...
43 2014-07-01 400
44 2014-08-01 400
45 2014-09-01 400
...
58 2015-10-01 500
59 2015-11-01 500
60 2015-12-01 500
If, as suggested in the comments, you need to divide by a different number of months each year, I would create a vector of the divisors named for the years. Here, I am using dplyr
for n_distinct
and the magrittr
pipe to ease naming of the common case. Then, add/change the entry for the different year:
toDivide <-
rep(12, n_distinct(myData$Date)) %>%
setNames(sort(unique(myData$Date)))
toDivide["2016"] <- 9
Then, instead of 12
as above, you can use this vector. So, replace this:
Total = x$Total / 12
within the lapply
loop, with this:
Total = x$Total / toDivide[as.character(x$Date)]
and it should divide by the entry for that year, instead of always dividing by the same thing.