I have a data frame that looks like this:
w<-read.table(header=TRUE,text="
start.date end.date
2006-06-26 2006-07-24
2006-07-19 2006-08-16
2007-06-09 2007-07-07
2007-06-24 2007-07-22
2007-07-03 2007-07-31
2007-08-04 2007-09-01
2007-08-07 2007-09-04
2007-09-05 2007-10-03
2007-09-14 2007-10-12
2007-10-19 2007-11-16
2007-11-17 2007-12-15
2008-06-18 2008-07-16
2008-06-28 2008-07-26
2008-07-11 2008-08-08
2008-07-23 2008-08-20")
I'm trying to get an output that will combine overlapping start and end dates into one date range. So for the example set, I'd like to get:
w<-read.table(header=TRUE,text="
start.date end.date
2006-06-26 2006-08-16
2007-06-09 2007-07-31
2007-08-04 2007-09-04
2007-09-05 2007-10-12
2007-10-19 2007-11-16
2007-11-17 2007-12-15
2008-06-18 2008-08-20")
The question is similar to Date roll-up in R, but I don't need to do any sort of group by on mine, so the answer there is confusing.
Also, the code that was suggested in response to my question below does not work for certain parts of my data frame such as:
x<-read.table(header=TRUE,text="start.date end.date
2006-01-19 2006-01-20
2006-01-25 2006-01-29
2006-02-24 2006-02-25
2006-03-15 2006-03-22
2006-04-29 2006-04-30
2006-05-24 2006-05-25
2006-06-26 2006-08-16
2006-07-05 2006-07-10
2006-07-12 2006-07-21
2006-08-13 2006-08-15
2006-08-18 2006-08-19
2006-08-28 2006-09-02")
I am confused why it does not?
The
IRanges
package on Bioconductor includes the functionreduce
which can be utilized to combine overlapping start and end dates into one date range.IRanges
works on integer ranges so you have to convert the data from classDate
tointeger
and back. This can be wrapped up in a function:Explanation
::
to access single functions from theIRanges
package over usinglibrary(IRanges)
which loads the whole package.as.Date
is just to ensure the proper class) and create anIRanges
object.reduce
does all the hard work. The parametermin.gapwidth
is required here asreduce
collapses adjacent ranges by default (see below).dplyr
instead ofdata.table
as well.)w
andx
.x
includes a special case where one date range embeds other date ranges to full extent.Appendix: Collapsing adjacent date ranges
The sample result given by the OP shows that adjacent data ranges should not be collapsed, e.g., the range
2007-10-19
to2007-11-16
is separate from the range2007-11-17
to2007-12-15
although the second range starts only one day after the first one has ended.Just in case, adjacent date ranges are to be collapsed this can be achieved by using the default value of the
min.gapwidth
parameter:Try this:
It conceptually merges overlapping intervals into the same group as shown below:
with output:
Solution.