Find the most recent date in a result set

2019-07-03 16:59发布

问题:

I'm working on a query where I need to look at patient vitals (specifically blood pressure) that were entered when a patient visited a clinic. I'm pulling results for the entire year of 2015, and of course there are certain patients who visited multiple times, and I need to only see the vitals that were entered at the most recent visit. Another slight twist is that systolic and diastolic pressures are entered separately, so I end up with results like:

Patient ID     Name           DOB          Test              Results       Date
---------------------------------------------------------------------------------
1000           John Smith     1/1/1955     BP - Diastolic    120           2/10/2015
1000           John Smith     1/1/1955     BP - Systolic     70            2/10/2015
1000           John Smith     1/1/1955     BP - Diastolic    128           7/12/2015
1000           John Smith     1/1/1955     BP - Systolic     75            7/12/2015
1000           John Smith     1/1/1955     BP - Diastolic    130           10/22/2015
1000           John Smith     1/1/1955     BP - Systolic     76            10/22/2015
9999           Jane Doe       5/4/1970     BP - Diastolic    130           4/2/2015
9999           Jane Doe       5/4/1970     BP - Systolic     60            4/2/2015
9999           Jane Doe       5/4/1970     BP - Diastolic    127           11/20/2015
9999           Jane Doe       5/4/1970     BP - Systolic     65            11/20/2015

There are 26,000+ results so obviously I don't want to go through every single patient and see when their most recent results were. I'd like my results to look like this:

Patient ID     Name           DOB          Test         Results       Date
---------------------------------------------------------------------------------
1000           John Smith     1/1/1955     BP - Diastolic    130           10/22/2015
1000           John Smith     1/1/1955     BP - Systolic     76            10/22/2015
9999           Jane Doe       5/4/1970     BP - Diastolic    127           11/20/2015
9999           Jane Doe       5/4/1970     BP - Systolic     65            11/20/2015

I know the name and date of birth and whatnot would get repeated, but I'm mainly focused on the results column.

Here's my query:

SELECT DISTINCT
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    v.TestDate as Date

FROM PatientDemographic pd JOIN Vitals v ON pd.PatientID = v.PatientID

WHERE v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%'

ORDER BY pd.PatientID, v.TestDate

After looking around for other answers, I tried doing a GROUP BY and the MAX() aggregate function for the v.TestDate column in the SELECT statement (I was specifically referencing this link, though it's for Oracle and I'm using SQL Server so I'm not entirely sure if the syntax would be the same). My query then looked like:

SELECT DISTINCT
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    MAX(v.TestDate) as Date

FROM PatientDemographic pd JOIN Vitals v ON pd.PatientID = v.PatientID

WHERE v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%'

GROUP BY pd.PatientID

Admittedly, I've always struggled a little with using GROUP BY. In this particular case, I get an error stating that I need to add the Patient Name column to the GROUP BY clause as well, so I do, and then it asks for DOB. Then the Test name. Basically, it wants me to add everything from my SELECT statement to the GROUP BY.

What's the best way to proceed and get my most recent patient visits?

回答1:

One simple method uses ROW_NUMBER() to find the more recent record for each test:

SELECT pd.PatientID as [Patient ID], pd.PatientName as Name, pd.DateOfBirth as DOB,
       v.Test as Test, v.Results as Results, v.TestDate as Date
FROM PatientDemographic pd JOIN
     (SELECT v.*,
             ROW_NUMBER() OVER (PARTITION BY PatientId, Test ORDER BY TestDate DESC) as seqnum
      FROM Vitals v
      WHERE v.TestDate BETWEEN '2015-01-01' AND '2015-12-31' AND
            v.Test LIKE 'BP%'
     ) v
     ON pd.PatientID = v.PatientID 
WHERE seqnum = 1
ORDER BY pd.PatientID, v.TestDate;


回答2:

I shy away from window functions as Gordon used. A technique using subqueries can get job done as well:

SELECT 
    ID
    ,Name
    ,DOB
    ,Test
    ,Results
    ,[Date]
FROM
    Vitals AS V
WHERE
    V.[Date] = (SELECT MAX([Date]) FROM Vitals W WHERE W.Name = V.Name AND W.Test = 'A')
    AND V.Test = 'A'

UNION

SELECT 
    ID
    ,Name
    ,DOB
    ,Test
    ,Results
    ,[Date]
FROM
    Vitals AS V
WHERE
    V.[Date] = (SELECT MAX([Date]) FROM Vitals W WHERE W.Name = V.Name AND W.Test = 'B')
    AND V.Test = 'B'


回答3:

This is for MS SQL 2005+

SELECT * FROM (
SELECT row_number() over(partition by pd.PatientID, v.Test order by v.TestDate desc) as rn,
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    v.TestDate as Date
FROM PatientDemographic pd 
JOIN Vitals v ON pd.PatientID = v.PatientID
WHERE v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%') t
WHERE rn = 1


回答4:

The windowing function will not be as efficient as the NOT EXISTS clause. I would like to propose a faster solution that does not use a windowing function:

SELECT 
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    v.TestDate as Date
FROM PatientDemographic pd JOIN Vitals v ON pd.PatientID = v.PatientID
WHERE 
    v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%'
    AND NOT EXISTS (
       SELECT 1 FROM Vitals as v2 where v2.PatientID = v.PatientID
       AND V2.TestDate BETWEEN '01/01/2015' AND '12/31/2015' 
       AND v2.Test LIKE 'BP%' 
       AND v2.TestDate > v.TestDate)


回答5:

You can also use Common Table Expression to achieve this.

        IF OBJECT_ID('tempdb..#RecentPatientVitals') IS NOT NULL
        DROP TABLE #RecentPatientVitals;
    GO

    CREATE TABLE #RecentPatientVitals
        (
          Patient_ID INT
        , Name VARCHAR(100)
        , DOB DATE
        , Test VARCHAR(150)
        , Results INT
        , [Date] DATE
        );

    INSERT  INTO #RecentPatientVitals
            ( Patient_ID, Name, DOB, Test, Results, [Date] )
    VALUES  ( 1000, 'John Smith', '1/1/1955', 'BP - Diastolic', 120, '2/10/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Systolic', 70, '2/10/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Diastolic', 128, '7/12/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Systolic', 75, '7/12/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Diastolic', 130, '10/22/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Systolic', 76, '10/22/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Diastolic', 130, '4/2/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Systolic', 60, '4/2/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Diastolic', 127, '11/20/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Systolic', 65, '11/20/2015' );

    SELECT  *
    FROM    #RecentPatientVitals;

    WITH    PatVitals1
              AS ( SELECT   Patient_ID
                          , Name
                          , DOB
                          , Test
                          , MAX(Date) AS Date
                   FROM     #RecentPatientVitals
                   GROUP BY Patient_ID
                          , Name
                          , DOB
                          , Test
                 ) ,
            PatVitals2
              AS ( SELECT   Patient_ID
                          , Test
                          , Results
                          , Date
                   FROM     #RecentPatientVitals
                 )
        SELECT  P1.Patient_ID
              , P1.Name
              , P1.DOB
              , P1.Test
              , P2.Results
              , P1.Date
        FROM    PatVitals1 P1
                INNER JOIN PatVitals2 P2
                ON P2.Patient_ID = P1.Patient_ID
                   AND P2.Date = P1.Date
                   AND P2.Test = P1.Test
        GROUP BY P1.Patient_ID
              , P1.Name
              , P1.DOB
              , P1.Test
              , P2.Results
              , P1.Date;