SQL Get Last Occurrence of Field Against Each Row

2019-08-03 19:59发布

问题:

How can I add the LastDocumentID column like so:

+------------+-----------+----------------+
| DocumentID | Reference | LastDocumentID |
+------------+-----------+----------------+
|          1 | A         | NULL           |
|          2 | A         | 1              |
|          3 | A         | 2              |
|          4 | B         | NULL           |
|          5 | B         | 4              |
|          6 | C         | NULL           |
|          7 | C         | 6              |
|          8 | C         | 7              |
|          9 | C         | 8              |
+------------+-----------+----------------+

The table could be in a random order, but in the Last Document ID I essentially want it to get the Max Document ID that is less than that row's Document ID for that row's Reference.

回答1:

You can get any value from the "last document" this way:

SELECT
   D.DocumentID,
   D.Reference,
   LastDocumentID = R.DocumentID
FROM
   dbo.Documents D
   OUTER APPLY (
      SELECT TOP 1 *
      FROM dbo.Documents R
      WHERE
         D.Reference = R.Reference
         AND R.DocumentID < D.DocumentID
      ORDER BY R.DocumentID DESC
   ) R
;

See this working in a SQL Fiddle

Though having identical logic to similar methods that compute just the column value in a subquery in the WHERE clause, this allows you to pull multiple columns from the previous document, and demonstrates OUTER APPLY. Change to CROSS APPLY if you want the equivalent INNER join (excluding rows that have no previous).

For reference, here's the single-value way to do it. You basically put the query contained in the OUTER APPLY into parentheses, and only select one column:

SELECT
   D.DocumentID,
   D.Reference,
   LastDocumentID = (
      SELECT TOP 1 R.DocumentID
      FROM dbo.Documents R
      WHERE
         D.Reference = R.Reference
         AND R.DocumentID < D.DocumentID
      ORDER BY R.DocumentID DESC
   )
FROM
   dbo.Documents D
;

Alternately, you can just use Max:

SELECT
   D.DocumentID,
   D.Reference,
   LastDocumentID = (
      SELECT Max(R.DocumentID)
      FROM dbo.Documents R
      WHERE
         D.Reference = R.Reference
         AND R.DocumentID < D.DocumentID
   )
FROM
   dbo.Documents D
;

If you were using SQL Server 2012 and up, you could do it this way using its more advanced syntax available for windowing functions:

SELECT
    D.DocumentID,
    D.Reference,
    LastDocumentID = Max(D.DocumentID) OVER (
       PARTITION BY D.Reference
       ORDER BY D.DocumentID ASC
       ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    )
FROM
    dbo.Documents D
;


回答2:

In SQL Server 2012+ you can use lag(). In SQL Server 2008, you can use a correlated subquery or outer apply. Here is one method:

select documentid, reference,
       (select top 1 documentid
        from table t2
        where t2.reference = t.reference and
              t2.documentid < t.documentid
        order by documentid desc
       ) as LastDocumentId
from table t;


回答3:

LAG might be more efficient if that's an option. But there's no particular reason to use TOP when MAX works great.

select
    documentid, reference,
    (
        select max(documentid) from table as t2
        where t2.reference = t1.reference and t2.documentid < t1.documentid
    )
from table as t1;


回答4:

I was asked to undelete this with the following caveat... for SQL Server versions greater than 2008:

You could accomplish this with Window functions

SELECT
    DocumentID,
    Reference,
    MAX(DocumentID) OVER (PARTITION BY Reference ORDER BY DocumentID ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LastDocumentID
FROM
    <table>

That longer line is the window function will fetch the max(documentid) from the group of rows that are partitioned by your reference field and are limited by all the previous documentids for that reference not including the current document.