I would like to add all missing dates between min and max date in a data.frame
and linear interpolate all missing values, like
df <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
"2015-10-12","2015-10-14")),
value = c(8,3,9,NA,5))
date value
2015-10-05 8
2015-10-08 3
2015-10-09 9
2015-10-12 NA
2015-10-14 5
date value approx
2015-10-05 8 8
2015-10-06 NA 6.33
2015-10-07 NA 4.67
2015-10-08 3 3
2015-10-09 9 9
2015-10-10 NA 8.20
2015-10-11 NA 7.40
2015-10-12 NA 6.60
2015-10-13 NA 5.80
2015-10-14 5 5
Is there a clear solution with dplyr
and approx
?
(I do not like my 10 line for
loop code.)
Here are a few solutions.
1) zoo Convert the data frame to a zoo series and use
na.approx
with anxout=
of sequential dates to get the final seriesgiving:
It may be more convenient to leave it in zoo form so you can use all the facilities of zoo but if you need it in data frame form just use
1a) zoo/magrittr The above could alternately be expressed as a magrittr pipeline:
(or omit the
fortify.zoo
part if you want zoo output).2) base R We can essentially do the same thing without packages like this:
I think your code would look much clear and simple if you use Forecast package.
Here is one way. I created a data frame with a sequence of date using the first and last date. Using
full_join()
in thedplyr
package, I merged the data frame andmydf
. I then usedna.approx()
in the zoo package to handle the interpolation in themutate()
part.Another nice and short solution (using imputeTS):