Fill missing date values in column by adding deliv

2019-02-21 05:40发布

Data:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

Expected Outcome:

   DB1 <- data.frame(orderItemID  = 1:10,     
 orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

Hey guys, it´s me again ;) and unfortunately (I think) I have a pretty difficult question... As you can see above I have some missing values in the delivery dates and I want to replace them by another date. That date should be the order date of the specific item + the average delivery time, in (full) days. (In the example it's 1.75 days so that rounds to 2 days) The average delivery time is the time calculated from the average value of all samples that do not contain Missing values = (2days+1day+3days+2days+1day+2days+1day+2days):8=1,75

so in a first step the average delivery time needs to be calculated an in the second step the order date + the average delivery time (in full days) needs to be entered instead of the NA´s

I tried already a little with [is.na(DB1$deliveryDate)] but unfortunately I have no good idea how to solve the problem...

Hope somebody got an idea

1条回答
家丑人穷心不美
2楼-- · 2019-02-21 06:04

You want to do date-arithmetic, and fill NAs in deliveryDate column by adding a date-interval of two days to orderDate column. lubridate supplies convenience functions for time intervals like days(), weeks(), months(), years(), hours(), minutes(), seconds() for exactly that purpose. And first, you have to parse your (European-format) datestrings into R date objects.

Something like the following, using lubridate for date-arithmetic and dplyr for dataframe manipulation:

require(dplyr)

DB1$orderDate    = as.POSIXct(DB1$orderDate, format="%d.%m.%y", tz='UTC')
DB1$deliveryDate = as.POSIXct(DB1$deliveryDate, format="%d.%m.%y", tz='UTC')

DB1 %>% group_by(orderDate) %>%
        summarize(delivery_time = (deliveryDate - orderDate)) %>%
        ungroup() %>% summarize(median(delivery_time, na.rm=T))

# median(delivery_time, na.rm = T)
#                         1.5 days
# so you round up to 2 days
delivery_days = 2.0

require(lubridate)
DB1 <- DB1 %>% filter(is.na(deliveryDate)) %>%
                mutate(deliveryDate = orderDate + days(2))

# orderItemID  orderDate deliveryDate
#           3 2013-04-12   2013-04-14
#           6 2014-02-19   2014-02-21
查看更多
登录 后发表回答