Using Inner Join to determine the interval between

2019-03-06 12:40发布

I have a simple table that is used to record events against specific visits:

Describe Histories;
    +------------------+
    | Field            |
    +------------------+
    | HistoryId        |
    | VisitId          |
    | Location         |
    | Event            |
    | EventTime        |
    +------------------+

Individuals are associated with Visits (VisitId). For each Visit, an individual may have multiple History records. Events can be Admission, Transfer or Discharge.

I am trying to write a Query to calculate the duration in each Location for each individual. Note, that they may visit a Location multiple times for each Visit. An individual enters a Location with an Admission or Transfer Event and leaves with a Discharge or Transfer.

If an individual enters Location 'A', their Admission or Transfer record will list Location 'A', however if they transfer out their transfer out (or discharge) will list another Location, say 'B'.

I therefore have to find the interval between the transfer into Location 'A' and the subsequent (in time) transfer to Location 'B'. Intra location transfers are not evaluated.

I understand that the solution will probably be based on an INNER JOIN, however I am at a loss to understand how to select for the transfer "out" record that corresponds to the most recent transfer "in".

I guess that this is reasonably complex - I hope I have been clear enough in my explanation.

Any guidance greatly appreciated.

2条回答
甜甜的少女心
2楼-- · 2019-03-06 12:54

Assuming a transfer or discharge is a unique event you could write like so

SELECT
   b.EventTime - a.EventTime        
FROM
   Histories a
   INNER JOIN Histories b
   ON a.VisitID = b.VisitID
WHERE
   a.event = 'Admission'
   and
   b.event in ('Transfer', 'Discharge')

If you were interested in the last transfer or discharge you would write

SELECT
   b.EventTime - a.EventTime        
FROM
   Histories a
   INNER JOIN  Histories b
    ON a.VisitID = b.VisitID

   INNER JOIN 
   (SELECT
         VisitId, 
         MAX(HistoryID) HistoryID
    FROM Histories 
    WHERE 
       b.event in ('Transfer', 'Discharge')
    GROUP BY 
       VisitId) maxHistory
   ON b.HistoryID = maxHistoryId.HistoryId

WHERE
   a.event = 'Admission'

However if a Visit can result in multiple visits as Andriy M mentions you have a Gaps And islands problem (specifically the islands)

In that case you want the following

SELECT  
       a.VisitId,
       a.Event a_Event, 
       a.Event b_Event, 
       a.EventTime a_EventTime,
       b.EventTime b_EventTime,
       b_EventTime - a_EventTime

FROM   histories a 
       INNER JOIN histories B 
         ON a.visitID = b.visitID 
            AND a.EventTime < b.eventTime 
       INNER JOIN (SELECT a.VisitId, 
                          a.EventTime      a_EventTime, 
                          Min(b.EventTime) b_EventTime 
                   FROM   histories a 
                          INNER JOIN histories B 
                            ON a.visitID = b.visitID 
                               AND a.EventTime < b.eventTime 
                   GROUP  BY a_EventTime, 
                             a.VisitId) MinTime 
         ON a.VisitID = MinTime.VisitID 
            AND a.EventTime = a_EventTime 
            AND b.EventTime = b_EventTime 

DEMO

Using the following sample data

CREATE TABLE Histories 
    (
     HistoryId int auto_increment primary key, 
     VisitId int,
     Location varchar(20),
     Event varchar(20), 
     EventTime datetime
    );

INSERT INTO Histories
(VisitId, Location, Event, EventTime)
VALUES
(1, 'A', 'Admission', '2012-01-01'),
(1, 'A', 'Discharge', '2012-01-03'),
(2, 'B', 'Admission', '2012-01-02'),
(2, 'C', 'Transfer', '2012-01-05'),
(2, 'C', 'Discharge', '2012-01-06'),
(3, 'D', 'Admission', '2012-01-06'),
(3, 'E', 'Transfer', '2012-01-07'),
(3, 'F', 'Transfer', '2012-01-08'),
(3, 'F', 'Discharge', '2012-01-10');

You get the following results

VISITID    A_EVENT   B_EVENT    A_EVENTTIME                     B_EVENTTIME                     B_EVENTTIME - A_EVENTTIME
1          Admission Discharge  January, 01 2012 00:00:00-0800  January, 03 2012 00:00:00-0800  2000000
2          Admission Transfer   January, 02 2012 00:00:00-0800  January, 05 2012 00:00:00-0800  3000000
2          Transfer  Discharge  January, 05 2012 00:00:00-0800  January, 06 2012 00:00:00-0800  1000000
3          Admission Transfer   January, 06 2012 00:00:00-0800  January, 07 2012 00:00:00-0800  1000000
3          Transfer  Transfer   January, 07 2012 00:00:00-0800  January, 08 2012 00:00:00-0800  1000000
3          Transfer  Discharge  January, 08 2012 00:00:00-0800  January, 10 2012 00:00:00-0800  2000000

Notes:

  • This assumes you don't care about Admissions/Transger that don't have a corresponding discharge/transfer yet.
  • If you know that eventTime doesn't change after the record is entered you could use historyID instead of eventime to determine the order of events.
  • You know how to get the Event Time difference in the format that you like
查看更多
唯我独甜
3楼-- · 2019-03-06 12:55

How does this work for you?

SELECT 
    h1.HistoryId, 
    h1.VisitId, 
    h1.Event AS InitialEvent, 
    h2.Event AS FinalEvent, 
    h1.Location AS StartLocation,
    h2.Location AS EndLocation,
    IF(h2.HistoryId, UNIX_TIMESTAMP(h2.EventTime) - UNIX_TIMESTAMP(h1.EventTime), NULL) AS transfer_duration_seconds
FROM Histories h1 
LEFT JOIN Histories h2 ON h1.VisitId = h2.VisitId AND h1.Location != h2.location AND h2. EventTime > h1. EventTime
GROUP BY h1.VisitId
查看更多
登录 后发表回答