R - Duplicating rows based on a sequence of start

I have a data frame "DF" like this:

Flight.Start   Flight.End   Device      Partner   Creative   Days.in.Flight 
2015-08-31     2015-08-31   Standard    MSN       Video      35

What I need to do is "blow it up" like so:

Flight.Start   Flight.End   Date         Device      Partner   Creative   Days.in.Flight 
2015-08-31     2015-10-04   2015-08-31   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-01   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-02   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-03   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-04   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-05   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-06   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-07   Standard    MSN       Video      35

ETC...... until Date variable hits 2015-10-04, then move on to next duplication

Essentially every row gets duplicated by the amount of days in flight - 1 (since the row that already exists can account for a single day in the interval, and then a new column "Date" is filled out for the relevant dates for within that flight. So if a row has a start and end date of 9/1 and 9/5 respectively, 4 duplicate rows would be appended to the one already existing, a new column would be created (Date), and the date sequence of whatever the flight start and end dates are for the original row will fill out the column values.

All date values are formatted as date, days in flight is a num, and the rest are factors.

EDIT

In response to the duplicate question flagging:

To clarify, this is NOT like the case that has been flagged as a duplicate, because my question is not really focused on how to duplicate based on days in flight (I already know how to do that!), but rather how I can then add column to that output data frame and sequentially insert dates within the corresponding flight period. Thanks for the heads up...

标签： r

3条回答

贪生不怕死

2楼-- · 2020-05-21 06:16

Here's a way to do it with base R:

mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
                   Device = "Standard",
                   Creative = "Video",
                   Days.in.Flight = c(3, 6),
                   stringsAsFactors = FALSE)

expanded <-mydf[rep(row.names(mydf), mydf$ Days.in.Flight), ]
data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))

> data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))
    Flight.Start Flight.End   Device Creative Days.in.Flight       Date
1     2015-09-01 2015-09-03 Standard    Video              3 2015-09-01
1.1   2015-09-01 2015-09-03 Standard    Video              3 2015-09-02
1.2   2015-09-01 2015-09-03 Standard    Video              3 2015-09-03
2     2015-09-10 2015-09-15 Standard    Video              6 2015-09-10
2.1   2015-09-10 2015-09-15 Standard    Video              6 2015-09-11
2.2   2015-09-10 2015-09-15 Standard    Video              6 2015-09-12
2.3   2015-09-10 2015-09-15 Standard    Video              6 2015-09-13
2.4   2015-09-10 2015-09-15 Standard    Video              6 2015-09-14
2.5   2015-09-10 2015-09-15 Standard    Video              6 2015-09-15

0人赞添加讨论(0) 举报

不美不萌又怎样

3楼-- · 2020-05-21 06:23

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(mydf)), replicate the sequence of rows by 'Days.in.Flight', based on that index, we subset the dataset (.SD[rep(...), grouped by 'Flight.Start', and 'Flight.End', we create the 'Date' column.

library(data.table)
setDT(mydf)[, .SD[rep(1:.N, Days.in.Flight)]][, 
     Date:= seq(Flight.Start , Flight.End, by = '1 day'),
     by = .(Flight.Start, Flight.End)][]

0人赞添加讨论(0) 举报

狗以群分

4楼-- · 2020-05-21 06:27

Here is one way with splitstackshape and dplyr. Using expandRows() from the splitstackshape package, you can expand your data frame as you described. Then, you want to add a sequence of dates using mutate(). What I did was to group the data by the combination of Flight.Start and Flight.End, and use seq() to create a sequence of date for each group. first() is taking the first element of Flight.Start and Flight.End. In this way, you can create the sequence you want. I hope this will help you.

DATA and CODE

mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
                   Device = "Standard",
                   Creative = "Video",
                   Days.in.Flight = c(3, 6),
                   stringsAsFactors = FALSE)

#  Flight.Start Flight.End   Device Creative Days.in.Flight
#1   2015-09-01 2015-09-03 Standard    Video              3
#2   2015-09-10 2015-09-15 Standard    Video              6

library(splitstackshape)
library(dplyr)

expandRows(mydf, "Days.in.Flight", drop = FALSE) %>%
group_by(Flight.Start, Flight.End) %>%
mutate(Date = seq(first(Flight.Start),
                  first(Flight.End),
                  by = 1))

#  Flight.Start Flight.End   Device Creative Days.in.Flight       Date
#        (date)     (date)    (chr)    (chr)          (dbl)     (date)
#1   2015-09-01 2015-09-03 Standard    Video              3 2015-09-01
#2   2015-09-01 2015-09-03 Standard    Video              3 2015-09-02
#3   2015-09-01 2015-09-03 Standard    Video              3 2015-09-03
#4   2015-09-10 2015-09-15 Standard    Video              6 2015-09-10
#5   2015-09-10 2015-09-15 Standard    Video              6 2015-09-11
#6   2015-09-10 2015-09-15 Standard    Video              6 2015-09-12
#7   2015-09-10 2015-09-15 Standard    Video              6 2015-09-13
#8   2015-09-10 2015-09-15 Standard    Video              6 2015-09-14
#9   2015-09-10 2015-09-15 Standard    Video              6 2015-09-15

0人赞添加讨论(0) 举报

R - Duplicating rows based on a sequence of start

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间