Transforming a min() and a max() value into a rang

2019-09-05 23:58发布

问题:

I would like stretch some of my records that were flattened

I have a table like this

Store     Min(Date)     Max (Date)     Status

NYC1       1/1/2013      2/1/2013      Open
NYC1       2/2/2013      2/3/2013      Closed for Inspection
Boston1    1/1/2013      2/5/2013      Open

and I would like to stretch it into the form

Store       Date        Status

 NYC1       1/1/2013     Open
 NYC1       1/2/2013     Open
 .....
 NYC1       2/2/2013     Closed for Inspection
 NYC1       2/3/2013     Closed for Inspection
 ....
 Boston1    1/1/2013     Open

I know I can always write loops for this, but before attempting so, I would like to ask if there's any quick and dirty way of doing it?

回答1:

Here's one approach:

Read in your data and convert your dates to actual date variables:

mydf <- read.table(header = TRUE, stringsAsFactors=FALSE, 
text = "Store     Min(Date)     Max(Date)     Status
NYC1       1/1/2013      2/1/2013      Open
NYC1       2/2/2013      2/3/2013      'Closed for Inspection'
Boston1    1/1/2013      2/5/2013      Open")

names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
mydf
#     store   min.date   max.date                status
# 1    NYC1 2013-01-01 2013-02-01                  Open
# 2    NYC1 2013-02-02 2013-02-03 Closed for Inspection
# 3 Boston1 2013-01-01 2013-02-05                  Open

Calculate the difference in days between "min.date" and "max.date"

Use that information to "expand" your data.frame and to generate the date sequence between "min.date" and "max.date". Also, subset the data.frame to return just the "store", "date" (our new variable), and "status" variables.

SEQ <- mydf$max.date - mydf$min.date + 1
mydf2 <- mydf[rep(row.names(mydf), SEQ), ]
mydf2$date <- mydf2$min.date + sequence(SEQ)-1

mydf2 <- mydf2[c("store", "date", "status")]

Here's a sample of the output.

head(mydf2)
#     store       date status
# 1    NYC1 2013-01-01   Open
# 1.1  NYC1 2013-01-02   Open
# 1.2  NYC1 2013-01-03   Open
# 1.3  NYC1 2013-01-04   Open
# 1.4  NYC1 2013-01-05   Open
# 1.5  NYC1 2013-01-06   Open
tail(mydf2)
#        store       date status
# 3.30 Boston1 2013-01-31   Open
# 3.31 Boston1 2013-02-01   Open
# 3.32 Boston1 2013-02-02   Open
# 3.33 Boston1 2013-02-03   Open
# 3.34 Boston1 2013-02-04   Open
# 3.35 Boston1 2013-02-05   Open

You can use by to verify we did everything correctly:

> with(mydf2, by(date, list(store, status), FUN = range))
: Boston1
: Closed for Inspection
NULL
----------------------------------------------------------------- 
: NYC1
: Closed for Inspection
[1] "2013-02-02" "2013-02-03"
----------------------------------------------------------------- 
: Boston1
: Open
[1] "2013-01-01" "2013-02-05"
----------------------------------------------------------------- 
: NYC1
: Open
[1] "2013-01-01" "2013-02-01"


回答2:

Using data.table for syntax elegance (and assuming the preprocessing by @Ananda

mydf <- read.table(header = TRUE, stringsAsFactors=FALSE, 
text = "Store     Min(Date)     Max(Date)     Status
NYC1       1/1/2013      2/1/2013      Open
NYC1       2/2/2013      2/3/2013      'Closed for Inspection'
Boston1    1/1/2013      2/5/2013      Open")

names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")

library(data.table)
DT <- data.table(mydf)
DT[, list(dates = seq(min.date,max.date, by = 1)) , by = list(store,status)]


回答3:

Green Demon

Given your question has the reshape package tag, the easiest thing I can think of is to simply use the melt function. Let's call your data.frame 'foo'. The code below should give you what you want.

library(reshape)
foo.melt<-melt(foo, id.vars=c('Store','Status'))

Note that this will create an additional column 'variable' with min.date and max.date.

Cheers,

Danny