How to add only missing Dates in Dataframe R

2020-04-06 07:16发布

I have below mentioned data frame:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-03  458      2.99
2018-04-05  354      1.25

I want to add only missing dates considering Sys.Date() (Here for example Sys.Date() is 2018-04-06) in dataframe with corresponding val1 and val2 as 0.

I have tried: t2<-merge(data.frame(Date= seq(min(ymd(t1$Date)), max(ymd(date)), by = "days")), t1, by = "Date", all = TRUE)

Required Dataframe:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-02  0        0
2018-04-03  458      2.99
2018-04-04  0        0
2018-04-05  354      1.25
2018-04-06  0        0

3条回答
放我归山
2楼-- · 2020-04-06 07:50

Here's a correction of your approach, in base R.

Replace max(t1$Date) bySys.Date() in your real application:

t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
          t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0

#         Date Val1 Val2
# 1 2018-04-01  125 0.05
# 2 2018-04-02    0 0.00
# 3 2018-04-03  458 2.99
# 4 2018-04-04    0 0.00
# 5 2018-04-05  354 1.25

data

t1 <- read.table(text="Date        Val1     Val2
'2018-04-01'  125 0.05
'2018-04-03'  458 2.99
'2018-04-05'  354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)
查看更多
够拽才男人
3楼-- · 2020-04-06 07:53

This could be done with complete

library(tidyverse)
df1 %>%
    complete(Date = seq(Date[1], Sys.Date(), by = "1 day"),
                fill = list(Val1 = 0, Val2 = 0))

If we need to pass multiple variables for the fill, create the list of columns that we need to fill

nm1 <- setdiff(names(df1), "Date") #in this example excluding the Date
nm2 <- setNames(as.list(rep(0, length(nm1))), nm1)

and then pass that as argument for the fill

df1 %>% 
     complete(Date = seq(Date[1], Sys.Date(), by = "1 day"), fill = nm2)
# A tibble: 35 x 3
#   Date        Val1  Val2
#   <date>     <dbl> <dbl>
# 1 2018-04-01   125  0.05
# 2 2018-04-02     0  0   
# 3 2018-04-03   458  2.99
# 4 2018-04-04     0  0   
# 5 2018-04-05   354  1.25
# 6 2018-04-06     0  0   
# 7 2018-04-07     0  0   
# 8 2018-04-08     0  0   
# 9 2018-04-09     0  0   
#10 2018-04-10     0  0   
# ... with 25 more rows
查看更多
叛逆
4楼-- · 2020-04-06 08:05

You could use padr. padr is made for filling in missing date values. First you add the missing dates based on the interval, and if you do not want NA's you fill them with a value (or function of most occuring value)

edit: added end_val to include the run until sys.Date()

library(padr)
# Specify end_val to go all the way to sys.Date and add 1 to include sys.Date
padded_df <- pad(df, interval = "day", end_val = Sys.Date()+1)
padded_df <- fill_by_value(padded_df, value = 0)
padded_df

        Date Val1 Val2
1 2018-04-01  125 0.05
2 2018-04-02    0 0.00
3 2018-04-03  458 2.99
4 2018-04-04    0 0.00
5 2018-04-05  354 1.25
.....

31 2018-05-01    0    0
32 2018-05-02    0    0
33 2018-05-03    0    0
34 2018-05-04    0    0
35 2018-05-05    0    0
36 2018-05-06    0    0
查看更多
登录 后发表回答