Find date range overlaps within the same table, fo

2019-01-08 00:33发布

问题:

I am by no means an MySQL expert, so I am looking for any help on this matter.

I need to perform a simple test (in principle), I have this (simplified) table:

tableid | userid  | car      | From        | To
--------------------------------------------------------
1       | 1       |  Fiesta  |  2015-01-01 | 2015-01-31
2       | 1       |  MX5     |  2015-02-01 | 2015-02-28
3       | 1       |  Navara  |  2015-03-01 | 2015-03-31
4       | 1       |  GTR     |  2015-03-28 | 2015-04-30
5       | 2       |  Focus   |  2015-01-01 | 2015-01-31
6       | 2       |  i5      |  2015-02-01 | 2015-02-28
7       | 2       |  Aygo    |  2015-03-01 | 2015-03-31
8       | 2       |  206     |  2015-03-29 | 2015-04-30
9       | 1       |  Skyline |  2015-04-29 | 2015-05-31
10      | 2       |  Skyline |  2015-04-29 | 2015-05-31

I need to find two things here:

  1. If any user has date overlaps in his car assignments of more than one day (end of the assignment can be on the same day as the new assignment start).
  2. Did any two users tried to get the same car assigned on the same date, or the date ranges overlap for them on the same car.

So the query (or queries) I am looking for should return those rows:

tableid | userid  | car      | From        | To
--------------------------------------------------------
3       | 1       |  Navara  |  2015-03-01 | 2015-03-31
4       | 1       |  GTR     |  2015-03-28 | 2015-04-30
7       | 2       |  Aygo    |  2015-03-01 | 2015-03-31
8       | 2       |  206     |  2015-03-29 | 2015-04-30
9       | 1       |  Skyline |  2015-04-29 | 2015-05-31
10      | 2       |  Skyline |  2015-04-29 | 2015-05-31 

I feel like I am bashing my head against the wall here, I would be happy with being able to do these comparisons in separate queries. I need to display them in one table but I could always then join the results.

I've done research and few hours of testing but I cant get nowhere near the result I want.

SQLFiddle with the above test data

I've tried these posts btw (they were not exactly what I needed but were close enough, or so I thought):

Comparing two date ranges within the same table

How to compare values of text columns from the same table

This was the closest solution I could find but when I tried it on a single table (joining table to itself) I was getting crazy results: Checking a table for time overlap?

EDIT

As a temporary solution I have adapted a different approach, similar to the posts I have found during my research (above). I will now check if the new car rental / assignment date overlaps with any date range within the table. If so I will save the id(s) of the rows that the date overlaps with. This way at least I will be able to flag overlaps and allow a user to look at the flagged rows and to resolve any overlaps manually.

Thanks to everyone who offered their help with this, I will flag philipxy answer as the chosen one (in next 24h) unless someone has better way of achieving this. I have no doubt that following his answer I will be able to eventually reach the results I need. At the moment though I need to adopt any solution that works as I need to finish my project in next few days, hence the change of approach.

Edit #2

The both answers are brilliant and to anyone who finds this post having the same issue as I did, read them both and look at the fiddles! :) A lot of amazing brain-work went into them! Temporarily I had to go with the solution I mention in #1 Edit of mine but I will be adapting my queries to go with @Ryan Vincent approach + @philipxy edits/comments about ignoring the initial one day overlap.

回答1:

Here is the first part: Overlapping cars per user...

SQLFiddle - correlated Query and Join Query

Second part - more than one user in one car at the same time: SQLFiddle - correlated Query and Join Query. Query below...

I use the correlated queries:

You will likely need indexes on userid and 'car'. However - please check the 'explain plan' to see how it mysql is accessing the data. And just try it :)

Overlapping cars per user

The query:

SELECT `allCars`.`userid`  AS `allCars_userid`, 
       `allCars`.`car`     AS `allCars_car`, 
       `allCars`.`From`    AS `allCars_From`, 
       `allCars`.`To`      AS `allCars_To`,
       `allCars`.`tableid` AS `allCars_id`
 FROM  
       `cars` AS `allCars`
 WHERE 
     EXISTS  
         (SELECT 1       
          FROM `cars` AS `overlapCar`            
          WHERE 
               `allCars`.`userid` = `overlapCar`.`userid` 
           AND `allCars`.`tableid` <> `overlapCar`.`tableid`          
           AND NOT (   `allCars`.`From`  >= `overlapCar`.`To`      /* starts after outer ends  */  
                    OR `allCars`.`To`    <= `overlapCar`.`From`))  /* ends before outer starts */
 ORDER BY
        `allCars`.`userid`, 
        `allCars`.`From`, 
        `allCars`.`car`;      

The results:

allCars_userid  allCars_car  allCars_From  allCars_To  allCars_id  
--------------  -----------  ------------  ----------  ------------
             1  Navara       2015-03-01    2015-03-31             3
             1  GTR          2015-03-28    2015-04-30             4
             1  Skyline      2015-04-29    2015-05-31             9
             2  Aygo         2015-03-01    2015-03-31             7
             2  206          2015-03-29    2015-04-30             8
             2  Skyline      2015-04-29    2015-05-31            10

Why it works? or How I think about it:

I use the correlated query so I don't have duplicates to deal with and it is probably the easiest to understand for me. There are other ways of expressing the query. Each has advantages and drawbacks. I want something I can easily understand.

Requirement: For each user ensure that they don't have two or more cars at the same time.

So, for each user record (AllCars) check the complete table (overlapCar) to see if you can find a different record that overlaps for the time of the current record. If we find one then select the current record we are checking (in allCars).

Therefore the overlap check is:

  • the allCars userid and the overLap userid must be the same
  • the allCars car record and the overlap car record must be different
  • the allCars time range and the overLap time range must overlap.

    The time range check:

    Instead of checking for overlapping times use positive tests. The easiest approach, is to check it doesn't overlap, and apply a NOT to it.

One car with More than One User at the same time...

The query:

SELECT  `allCars`.`car`     AS `allCars_car`,
        `allCars`.`userid`  AS `allCars_userid`,  
        `allCars`.`From`    AS `allCars_From`, 
        `allCars`.`To`      AS `allCars_To`, 
        `allCars`.`tableid` AS `allCars_id`

 FROM  
       `cars` AS `allCars`
 WHERE 
     EXISTS  
        (SELECT 1       
         FROM `cars` AS `overlapUser`            
         WHERE 
              `allCars`.`car` = `overlapUser`.`car` 
          AND `allCars`.`tableid` <> `overlapUser`.`tableid`          
          AND NOT (    `allCars`.`From`  >= `overlapUser`.`To`       /* starts after outer ends  */  
                   OR  `allCars`.`To`    <= `overlapUser`.`From`))  /* ends before outer starts */
 ORDER BY
        `allCars`.`car`,      
        `allCars`.`userid`, 
        `allCars`.`From`;

The results:

allCars_car  allCars_userid  allCars_From  allCars_To    allCars_id  
-----------  --------------  ------------  ----------  ------------
Skyline                   1  2015-04-29    2015-05-31             9
Skyline                   2  2015-04-29    2015-05-31            10

Edit:

In view of the comments, by @philipxy , about time ranges needing 'greater than or equal to' checks I have updated the code here. I havn't changed the SQLFiddles.



回答2:

For each input and output table find its meaning. Ie a statement template parameterized by column names, aka predicate, that a row makes into a true or false statement, aka proposition. A table holds the rows that make its predicate into a true proposition. Ie rows that make a true proposition go in a table and rows that make a false proposition stay out. Eg for your input table:

rental [tableid] was user [userid] renting car [car] from [from] to [to]

Then phrase the output table predicate in terms of the input table predicate. Don't use descriptions like your 1 & 2:

  1. If any user has date overlaps in his car assignments of more than one day (end of the assignment can be on the same day as the new assignment start).

Instead find the predicate that an arbitrary row states when in the table:

rental [tableid] was user [user] renting car [car] from [from] to [to]
    in self-conflict with some other rental

For the DBMS to calculate the rows making this true we must express this in terms of our given predicate(s) plus literals & conditions:

-- query result holds the rows where
FOR SOME t2.tableid, t2.userid, ...:
    rental [t1.tableid] was user [t1.userid] renting car [t1.car] from [t1.from] to [t1.to]
AND rental [t2.tableid] was user [t2.userid] renting car [t2.car] from [t2.from] to [t2.to]
AND [t1.userid] = [t2.userid] -- userids id the same users
AND [t1.to] > [t2.from] AND ...  -- tos/froms id intervals with overlap more than one day
...

(Inside an SQL SELECT statement the cross product of JOINed tables has column names of the form alias.column. Think of . as another character allowed in column names. Finally the SELECT clause drops the alias.s.)

We convert a query predicate to an SQL query that calculates the rows that make it true:

  • A table's predicate gets replaced by the table alias.
  • To use the same predicate/table multiple times make aliases.
  • Changing column old to new in a predicate adds ANDold=new.
  • AND of predicates gets replaced by JOIN.
  • OR of predicates gets replaced by UNION.
  • AND NOT of predicates gets replaced by EXCEPT, MINUS or appropriate LEFT JOIN.
  • ANDcondition gets replaced by WHERE or ON condition.
  • For a predicate true FOR SOMEcolumns to drop or when THERE EXISTScolumns to drop, SELECT DISTINCTcolumns to keep.
  • Etc. (See this.)

Hence (completing the ellipses):

SELECT DISTINCT t1.*
FROM t t1 JOIN t t2
ON t1.userid = t1.userid -- userids id the same users
WHERE t1.to > t2.from AND t2.to > t1.from -- tos/froms id intervals with overlap more than one day
AND t1.tableid <> t2.tableid -- tableids id different rentals
  1. Did any two users tried to get the same car assigned on the same date, or the date ranges overlap for them on the same car.

Finding the predicate that an arbitrary row states when in the table:

rental [tableid] was user [user] renting car [car] from [from] to [to]
    in conflict with some other user's rental

In terms of our given predicate(s) plus literals & conditions:

-- query result holds the rows where
FOR SOME t2.*
    rental [t1.tableid] was user [t1.userid] renting car [t1.car] from [t1.from] to [t1.to]
AND rental [t2.tableid] was user [t2.userid] renting car [t2.car] from [t2.from] to [t2.to]
AND [t1.userid] <> [t2.userid] -- userids id different users
AND [t1.car] = [t2.car] -- .cars id the same car
AND [t1.to] >= [t2.from] AND [t2.to] >= [t1.from] -- tos/froms id intervals with any overlap
AND [t1.tableid] <> [t2.tableid] -- tableids id different rentals

The UNION of queries for predicates 1 & 2 returns the rows for which predicate 1ORpredicate 2.

Try to learn to express predicates--what rows state when in tables--if only as the goal for intuitive (sub)querying.

PS It is good to always have data checking edge & non-edge cases for a condition being true & being false. Eg try query 1 with GTR starting on the 31st, an overlap of only one day, which should not be a self-conflict.

PPS Querying involving duplicate rows, as with NULLs, has quite complex query meanings. It's hard to say when a tuple goes in or stays out of a table and how many times. For queries to have the simple intuitive meanings per my correspondences they can't have duplicates. Here SQL unfortunately differs from the relational model. In practice people rely on idioms when allowing non-distinct rows & they rely on rows being distinct because of constraints. Eg joining on UNIQUE columns per UNIQUEs, PKs & FKs. Eg: A final DISTINCT step is only doing work at a different time than a version that doesn't need it; time might or might not be be an important implementation issue affecting the phrasing chosen for a given predicate/result.