I have below mentioned data frame:
Date Val1 Val2
2018-04-01 125 0.05
2018-04-03 458 2.99
2018-04-05 354 1.25
I want to add only missing dates considering Sys.Date()
(Here for example Sys.Date()
is 2018-04-06) in dataframe with corresponding val1 and val2 as 0.
I have tried: t2<-merge(data.frame(Date= seq(min(ymd(t1$Date)), max(ymd(date)), by = "days")), t1, by = "Date", all = TRUE)
Required Dataframe:
Date Val1 Val2
2018-04-01 125 0.05
2018-04-02 0 0
2018-04-03 458 2.99
2018-04-04 0 0
2018-04-05 354 1.25
2018-04-06 0 0
Here's a correction of your approach, in base R.
Replace max(t1$Date)
bySys.Date()
in your real application:
t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0
# Date Val1 Val2
# 1 2018-04-01 125 0.05
# 2 2018-04-02 0 0.00
# 3 2018-04-03 458 2.99
# 4 2018-04-04 0 0.00
# 5 2018-04-05 354 1.25
data
t1 <- read.table(text="Date Val1 Val2
'2018-04-01' 125 0.05
'2018-04-03' 458 2.99
'2018-04-05' 354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)
This could be done with complete
library(tidyverse)
df1 %>%
complete(Date = seq(Date[1], Sys.Date(), by = "1 day"),
fill = list(Val1 = 0, Val2 = 0))
If we need to pass multiple variables for the fill
, create the list of columns that we need to fill
nm1 <- setdiff(names(df1), "Date") #in this example excluding the Date
nm2 <- setNames(as.list(rep(0, length(nm1))), nm1)
and then pass that as argument for the fill
df1 %>%
complete(Date = seq(Date[1], Sys.Date(), by = "1 day"), fill = nm2)
# A tibble: 35 x 3
# Date Val1 Val2
# <date> <dbl> <dbl>
# 1 2018-04-01 125 0.05
# 2 2018-04-02 0 0
# 3 2018-04-03 458 2.99
# 4 2018-04-04 0 0
# 5 2018-04-05 354 1.25
# 6 2018-04-06 0 0
# 7 2018-04-07 0 0
# 8 2018-04-08 0 0
# 9 2018-04-09 0 0
#10 2018-04-10 0 0
# ... with 25 more rows
You could use padr
. padr
is made for filling in missing date values.
First you add the missing dates based on the interval, and if you do not want NA's you fill them with a value (or function of most occuring value)
edit: added end_val to include the run until sys.Date()
library(padr)
# Specify end_val to go all the way to sys.Date and add 1 to include sys.Date
padded_df <- pad(df, interval = "day", end_val = Sys.Date()+1)
padded_df <- fill_by_value(padded_df, value = 0)
padded_df
Date Val1 Val2
1 2018-04-01 125 0.05
2 2018-04-02 0 0.00
3 2018-04-03 458 2.99
4 2018-04-04 0 0.00
5 2018-04-05 354 1.25
.....
31 2018-05-01 0 0
32 2018-05-02 0 0
33 2018-05-03 0 0
34 2018-05-04 0 0
35 2018-05-05 0 0
36 2018-05-06 0 0