linear interpolate missing values in time series

I would like to add all missing dates between min and max date in a data.frame and linear interpolate all missing values, like

df <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
                                  "2015-10-12","2015-10-14")),       
                 value = c(8,3,9,NA,5))

      date value
2015-10-05     8
2015-10-08     3
2015-10-09     9
2015-10-12    NA
2015-10-14     5

      date value approx
2015-10-05     8      8
2015-10-06    NA   6.33
2015-10-07    NA   4.67
2015-10-08     3      3
2015-10-09     9      9
2015-10-10    NA   8.20
2015-10-11    NA   7.40
2015-10-12    NA   6.60
2015-10-13    NA   5.80
2015-10-14     5      5

Is there a clear solution with dplyr and approx? (I do not like my 10 line for loop code.)

标签： r time-series dplyr linear-interpolation

4条回答

Explosion°爆炸

2楼-- · 2020-02-28 18:46

Here are a few solutions.

1) zoo Convert the data frame to a zoo series and use na.approx with an xout= of sequential dates to get the final series

library(zoo)
z <- read.zoo(mydf)
zz <- na.approx(z, xout = seq(start(z), end(z), "day"))

giving:

> zz
2015-10-05 2015-10-06 2015-10-07 2015-10-08 2015-10-09 2015-10-10 2015-10-11 
  8.000000   6.333333   4.666667   3.000000   9.000000   8.200000   7.400000 
2015-10-12 2015-10-13 2015-10-14 
  6.600000   5.800000   5.000000

It may be more convenient to leave it in zoo form so you can use all the facilities of zoo but if you need it in data frame form just use

DF <- fortify.zoo(zz)

1a) zoo/magrittr The above could alternately be expressed as a magrittr pipeline:

library(magrittr)
df %>% read.zoo %>% na.approx(xout = seq(start(.), end(.), "day")) %>% fortify.zoo

(or omit the fortify.zoo part if you want zoo output).

2) base R We can essentially do the same thing without packages like this:

n <- nrow(mydf)
with(mydf, data.frame(approx(date, value, xout = seq(date[1], date[n], "day"))))

0人赞添加讨论(0) 举报

傲

3楼-- · 2020-02-28 18:49

I think your code would look much clear and simple if you use Forecast package.

library(forecast)
x <- zoo(df$value,df$date)
x <- as.ts(x)
x <- na.interp(x)
print(x)

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2020-02-28 18:51

Here is one way. I created a data frame with a sequence of date using the first and last date. Using full_join() in the dplyr package, I merged the data frame and mydf. I then used na.approx() in the zoo package to handle the interpolation in the mutate() part.

mydf <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
                                    "2015-10-12","2015-10-14")),       
                   value = c(8,3,9,NA,5))

library(dplyr)
library(zoo)

data.frame(date = seq(mydf$date[1], mydf$date[nrow(mydf)], by = 1)) %>%
full_join(mydf, by = "date") %>%
mutate(approx = na.approx(value))

#         date value   approx
#1  2015-10-05     8 8.000000
#2  2015-10-06    NA 6.333333
#3  2015-10-07    NA 4.666667
#4  2015-10-08     3 3.000000
#5  2015-10-09     9 9.000000
#6  2015-10-10    NA 8.200000
#7  2015-10-11    NA 7.400000
#8  2015-10-12    NA 6.600000
#9  2015-10-13    NA 5.800000
#10 2015-10-14     5 5.000000

0人赞添加讨论(0) 举报

戒情不戒烟

5楼-- · 2020-02-28 18:58

Another nice and short solution (using imputeTS):

library(imputeTS)
x <- zoo(df$value,df$date)
x <- na.interpolation(x, option = "linear")
print(x)

0人赞添加讨论(0) 举报

linear interpolate missing values in time series

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间