SQL Query to show gaps between multiple date range

2019-01-13 09:10发布

站内文章 / 后端开发

19 0

再贱就再见

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Im working on a SSRS / SQL project and trying to write a query to get the gaps between dates and I am completely lost with how to write this.Basically we have a number of devices which can be scheduled for use and I need a report to show when they are not in use.

I have a table with Device ID, EventStart and EventEnd times, I need to run a query to get the times between these events for each device but I am not really sure how to do this.

For example:

Device 1 Event A runs from `01/01/2012 08:00 - 01/01/2012 10:00`
Device 1 Event B runs from `01/01/2012 18:00 - 01/01/2012 20:00`    
Device 1 Event C runs from `02/01/2012 18:00 - 02/01/2012 20:00`    
Device 2 Event A runs from `01/01/2012 08:00 - 01/01/2012 10:00`
Device 2 Event B runs from `01/01/2012 18:00 - 01/01/2012 20:00`

My query should have as its result

`Device 1 01/01/2012 10:00 - 01/01/2012 18:00`
`Device 1 01/01/2012 20:00 - 02/01/2012 18:00`
`Device 2 01/01/2012 10:00 - 01/01/2012 18:00`

There will be around 4 - 5 devices on average in this table, and maybe 200 - 300 + events.

Updates:

Ok I'll update this to try give a bit more info since I dont seem to have explained this too well (sorry!)

What I am dealing with is a table which has details for Events, Each event is a booking of a flight simulator, We have a number of flight sims( refered to as devices in the table) and we are trying to generate a SSRS report which we can give to a customer to show the days / times each sim is available.

So I am going to pass in a start / end date parameter and select all availabilities between those dates. The results should then display as something like:

Device   Available_From       Available_To
 1       01/01/2012 10:00    01/01/2012 18:00`
 1       01/01/2012 20:00    02/01/2012 18:00`
 2       01/01/2012 10:00    01/01/2012 18:00`

Also Events can sometimes overlap though this is very rare and due to bad data, it doesnt matter about an event on one device overlapping an event on a different device as I need to know availability for each device seperately.

回答1:

The Query:

Assuming the fields containing the interval are named Start and Finish, and the table is named YOUR_TABLE, the query...

SELECT Finish, Start
FROM
    (
        SELECT DISTINCT Start, ROW_NUMBER() OVER (ORDER BY Start) RN
        FROM YOUR_TABLE T1
        WHERE
            NOT EXISTS (
                SELECT *
                FROM YOUR_TABLE T2
                WHERE T1.Start > T2.Start AND T1.Start < T2.Finish
            )
        ) T1
    JOIN (
        SELECT DISTINCT Finish, ROW_NUMBER() OVER (ORDER BY Finish) RN
        FROM YOUR_TABLE T1
        WHERE
            NOT EXISTS (
                SELECT *
                FROM YOUR_TABLE T2
                WHERE T1.Finish > T2.Start AND T1.Finish < T2.Finish
            )
    ) T2
    ON T1.RN - 1 = T2.RN
WHERE
    Finish < Start

...gives the following result on your test data:

Finish                      Start
2012-01-01 10:00:00.000     2012-01-01 18:00:00.000

The important property of this query is that it would work on overlapping intervals as well.

The Algorithm:

1. Merge Overlapping Intervals

The subquery T1 accepts only those interval starts that are outside other intervals. The subquery T2 does the same for interval ends. This is what removes overlaps.

The DISTINCT is important in case there are two identical interval starts (or ends) that are both outside other intervals. The WHERE Finish < Start simply eliminates any empty intervals (i.e. duration 0).

We also attach a row number relative to temporal ordering, which will be needed in the next step.

The T1 yields:

Start                       RN
2012-01-01 08:00:00.000     1
2012-01-01 18:00:00.000     2

The T2 yields:

Finish                      RN
2012-01-01 10:00:00.000     1
2012-01-01 20:00:00.000     2

2. Reconstruct the Result

We can now reconstruct either the "active" or the "inactive" intervals.

The inactive intervals are reconstructed by putting together end of the previous interval with the beginning of the next one, hence - 1 in the ON clause. Effectively, we put...

Finish                      RN
2012-01-01 10:00:00.000     1

...and...

Start                       RN
2012-01-01 18:00:00.000     2

...together, resulting in:

Finish                      Start
2012-01-01 10:00:00.000     2012-01-01 18:00:00.000

(The active intervals could be reconstructed by putting rows from T1 alongside rows from T2, by using JOIN ... ON T1.RN = T2.RN and reverting WHERE.)

The Example:

Here is a slightly more realistic example. The following test data:

Device      Event      Start                      Finish
Device 1    Event A    2012-01-01 08:00:00.000    2012-01-01 10:00:00.000
Device 2    Event B    2012-01-01 18:00:00.000    2012-01-01 20:00:00.000
Device 3    Event C    2012-01-02 11:00:00.000    2012-01-02 15:00:00.000
Device 4    Event D    2012-01-02 10:00:00.000    2012-01-02 12:00:00.000
Device 5    Event E    2012-01-02 10:00:00.000    2012-01-02 15:00:00.000
Device 6    Event F    2012-01-03 09:00:00.000    2012-01-03 10:00:00.000

Gives the following result:

Finish                      Start
2012-01-01 10:00:00.000     2012-01-01 18:00:00.000
2012-01-01 20:00:00.000     2012-01-02 10:00:00.000
2012-01-02 15:00:00.000     2012-01-03 09:00:00.000

回答2:

First Answer -- but see below for final one with additional constraints added by OP.

-- If you want to get the next startTime after the most recent endTime and avoid overlaps, you want something like:

select
    distinct
    e1.deviceId,
    e1.EventEnd,
    e3.EventStart
from Events e1 
join Events e3 on e1.eventEnd < e3.eventStart     /* Finds the next start Time */
and e3.eventStart = (select min(eventStart) from Events e5
                     where e5.eventStart > e1.eventEnd)
and not exists (select *                          /* Eliminates an e1 rows if it is overlapped */
                from Events e5 
                where e5.eventStart < e1.eventEnd
                    and e5.eventEnd > e1.eventEnd)

For the case of your three rows:

INSERT INTO Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
insert into Events values (2, '01/01/2012 09:00', '01/01/2012 11:00')

This gives 1 result:

January, 01 2012 11:00:00-0800  January, 01 2012 18:00:00-0800

However, I assume you probably want to match on DeviceId also. In which case, on the joins, you'd add e1.DeviceId = e3.DeviceId and e1.deviceId = e5.deviceId

SQL Fiddle here: http://sqlfiddle.com/#!3/3899c/8

OK, final edit. Here's a query adding in deviceIds and adding in a distinct to account for simultenously ending events:

SELECT distinct
    e1.DeviceID,
    e1.EventEnd as LastEndTime,
    e3.EventStart as NextStartTime
FROM Events e1 
join Events e3 on e1.eventEnd < e3.eventStart
     and e3.deviceId = e1.deviceId
     and e3.eventStart = (select min(eventStart) from Events e5
                     where e5.eventStart > e1.eventEnd
                    and e5.deviceId = e3.deviceId)
where not exists (select * from Events e7 
                    where e7.eventStart < e1.eventEnd
                      and e7.eventEnd > e1.eventEnd
                      and e7.deviceId = e1.deviceId)
order by e1.deviceId, e1.eventEnd

The join to the e3 finds the next start. The join to e5 guarantees that this is the earliest starttime after the current endtime. The join to e7 eliminates a row if the end-time of the considered row is overlapped by a different row.

For this data:

INSERT INTO Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
insert into Events values (2, '01/01/2012 09:00', '01/01/2012 11:00')
insert into Events values (2, '01/02/2012 11:00', '01/02/2012 15:00')
insert into Events values (1, '01/02/2012 10:00', '01/02/2012 12:00')
insert into Events values (2, '01/02/2012 10:00', '01/02/2012 15:00')
insert into Events values (2, '01/03/2012 09:00', '01/03/2012 10:00')

You get this result:

1   January, 01 2012 10:00:00-0800  January, 02 2012 10:00:00-0800
2   January, 01 2012 11:00:00-0800  January, 01 2012 18:00:00-0800
2   January, 01 2012 20:00:00-0800  January, 02 2012 10:00:00-0800
2   January, 02 2012 15:00:00-0800  January, 03 2012 09:00:00-0800

SQL Fiddle here: http://sqlfiddle.com/#!3/db0fa/3

回答3:

I'm going to assume that it's not really this simple... but here's a query based on my current understanding of your scenario:

DECLARE @Events TABLE (
    DeviceID INT,
    EventStart DATETIME,
    EventEnd DATETIME
)

INSERT INTO @Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO @Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')

SELECT
    e1.DeviceID,
    e1.EventEnd,
    e2.EventStart
FROM 
    @Events e1 
    JOIN @Events e2 
        ON e2.EventStart = (
            SELECT MIN(EventStart)
            FROM @Events
            WHERE EventStart > e1.EventEnd
        )

回答4:

Does this solve your issue:

http://www.simple-talk.com/sql/t-sql-programming/find-missing-date-ranges-in-sql/
http://www.simple-talk.com/sql/t-sql-programming/missing-date-ranges--the-sequel/

The second one seems more relevant

'There is a table, where two of the columns are DateFrom and DateTo. Both columns contain date and time values. How does one find the missing date ranges or, in other words, all the date ranges that are not covered by any of the entries in the table'.

回答5:

I don't really understand your question:

"I have a table with Device ID, EventStart and EventEnd times"

followed by

"I need to run a query to get the times between these events"

"For example:

Device 1 Event A runs from01/01/2012 08:00 - 01/01/2012 10:00"

In order to get this you only need to do

select Device_ID,EventStart,EventEnd from Device

If you want to get the time difference between start and end dates for each device you can use DATEDIFF:

select Device_ID,DATEDIFF(EventEnd - EventStart) from Device

回答6:

Here is a Postgres solution that I just did, that does not involve stored procedures:

SELECT minute, sum(case when dp.id is null then 0 else 1 end) as s 
FROM generate_series( 
   '2017-12-28'::timestamp,
   '2017-12-30'::timestamp,
   '1 minute'::interval
) minute 
left outer join device_periods as dp
on minute >= dp.start_date and minute < dp.end_date 
group by minute order by minute

The generate_series function generates a table that has one row for each minute in the date range. You can change the interval to 1 second, to be more precise. It is a postgres specific function, but probably something similar exists in other engines.

This query will give you all the minutes that are filled, and all that are blank. You can wrap this query in an outer query, that can group by hours, days or do some window function operations to get the exact output as you need it. For my purposes, I only needed to count if there are blanks or not.