SQL: Return audit for records for each column

2019-09-01 06:06发布

问题:

I have audit for records in a table. There multiple columns and each records states for a change of 1 or more columns.
I need to return an audit result where the return pattern will be: column (id, alias or name), previous value, new value, etc.
The problem is in that there can be multiple columns with changed data per each new record. At the same time the quantity of auditable columns is 5 so it's possible to "hardcode" there names and changes verifications.

So is it possible to compose such a query in a shortened manner not just using UNIONS and make a SELECT query for each column and check for change?

Let's say there is the table with the columns:

id, datetime value, int value, varchar value.

And if I have 2 records with such a data change as like:

id1, value1, value1, value1
id1, value2, value1, value2

Then I expect such audit results:

id1, value1 as oldvalue, value2 as newvalue, column2name as columnname
id1, value1 as oldvalue, value2 as newvalue, column4name as columnname

回答1:

If I haven't missed anything:

WITH ranked AS (
  SELECT
    ChangeDate,
    ColPK,
    Col1,
    Col2,
    Col3,
    Col4,
    Col5,
    OverallRank = ROW_NUMBER() OVER (PARTITION BY ColPK       ORDER BY ChangeDate),
    Col1Rank    = ROW_NUMBER() OVER (PARTITION BY ColPK, Col1 ORDER BY ChangeDate),
    Col2Rank    = ROW_NUMBER() OVER (PARTITION BY ColPK, Col2 ORDER BY ChangeDate),
    Col3Rank    = ROW_NUMBER() OVER (PARTITION BY ColPK, Col3 ORDER BY ChangeDate),
    Col4Rank    = ROW_NUMBER() OVER (PARTITION BY ColPK, Col4 ORDER BY ChangeDate),
    Col5Rank    = ROW_NUMBER() OVER (PARTITION BY ColPK, Col5 ORDER BY ChangeDate)
  FROM AuditTable
)
, ranked2 AS (
  SELECT
    ChangeDate,
    ColPK,
    Col1,
    Col2,
    Col3,
    Col4,
    Col5,
    Col1Group = RANK() OVER (PARTITION BY ColPK, Col1 ORDER BY OverallRank - Col1Rank),
    Col2Group = RANK() OVER (PARTITION BY ColPK, Col2 ORDER BY OverallRank - Col2Rank),
    Col3Group = RANK() OVER (PARTITION BY ColPK, Col3 ORDER BY OverallRank - Col3Rank),
    Col4Group = RANK() OVER (PARTITION BY ColPK, Col4 ORDER BY OverallRank - Col4Rank),
    Col5Group = RANK() OVER (PARTITION BY ColPK, Col5 ORDER BY OverallRank - Col5Rank),
    Col1Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col1, OverallRank - Col1Rank ORDER BY ChangeDate),
    Col2Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col2, OverallRank - Col2Rank ORDER BY ChangeDate),
    Col3Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col3, OverallRank - Col3Rank ORDER BY ChangeDate),
    Col4Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col4, OverallRank - Col4Rank ORDER BY ChangeDate),
    Col5Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col5, OverallRank - Col5Rank ORDER BY ChangeDate)
  FROM ranked
),
unpivoted AS (
  SELECT
    r.ChangeTime,
    r.ColPK,
    x.ColName,
    ColRank = CASE x.Colname
      WHEN 'Col1' THEN Col1Group
      WHEN 'Col2' THEN Col2Group
      WHEN 'Col3' THEN Col3Group
      WHEN 'Col4' THEN Col4Group
      WHEN 'Col5' THEN Col5Group
    END,
    Value = CASE x.Colname
      WHEN 'Col1' THEN CONVERT(nvarchar(100), r.Col1)
      WHEN 'Col2' THEN CONVERT(nvarchar(100), r.Col2)
      WHEN 'Col3' THEN CONVERT(nvarchar(100), r.Col3)
      WHEN 'Col4' THEN CONVERT(nvarchar(100), r.Col4)
      WHEN 'Col5' THEN CONVERT(nvarchar(100), r.Col5)
    END
  FROM ranked2 r
    INNER JOIN (VALUES ('Col1'), ('Col2'), ('Col3'), ('Col4'), ('Col5')) x (ColName)
      ON x.ColName = 'Col1' AND Col1Rank = 1
      OR x.ColName = 'Col2' AND Col2Rank = 1
      OR x.ColName = 'Col3' AND Col3Rank = 1
      OR x.ColName = 'Col4' AND Col4Rank = 1
      OR x.ColName = 'Col5' AND Col5Rank = 1
)
SELECT
  new.ChangeTime,
  new.ColPK,
  new.ColName,
  old.Value AS OldValue,
  new.Value AS NewValue
FROM unpivoted new
  LEFT JOIN unpivoted old
    ON new.ColPK   = old.ColPK
   AND new.ColName = old.ColName
   AND new.ColRank = old.ColRank + 1

Basically, the idea is to rank contiguous groups of identical values and pick first occurrences of every value. That is done for every column whose values are being audited, and the columns are unpivoted in the process. Afterwards, the unpivoted row set is joined to itself, i.e. for every PK and column name, every row is matched to its predecessor (based on the ranking) to obtain the old value in the same row for the final result set.



回答2:

Here is a simpler query that produces the same desired results, and is much easier to modify to accommodate a different number of columns or changing column names, since the only differences are the PK column(s) + a single line per non-PK-column in the CROSS APPLY. I had to add a ChangeDate column--without it, there is no way to know the order of the rows inserted to the audit table.

WITH ColValues AS (
   SELECT
      Grp = Row_Number() OVER (
         PARTITION BY H.OrderID, U.ColName ORDER BY H.ChangeDate ASC, X.Which
      ) / 2,
      H.OrderID,
      H.ChangeDate,
      U.*,
      X.Which
   FROM
      dbo.OrderHistory H
      CROSS APPLY (VALUES
         ('DeliveryDate', Convert(varchar(1000), DeliveryDate, 121)),
         ('Quantity', Convert(varchar(1000), Quantity)),
         ('SpecialNotes', Convert(varchar(1000), SpecialNotes))
      ) U (ColName, Value)
      CROSS JOIN (VALUES (1), (2)) X (Which)
)
SELECT
   V.OrderID,
   V.ColName,
   DateChanged = Max(V.ChangeDate),
   OldValue = Max(F.Value),
   NewValue = Max(T.Value)
FROM
   ColValues V
   OUTER APPLY (SELECT V.ColName, V.Value WHERE V.Which = 2) F
   OUTER APPLY (SELECT V.ColName, V.Value WHERE V.Which = 1) T
GROUP BY
   V.OrderID,
   V.ColName,
   V.Grp
HAVING
   Count(*) = 2
   AND EXISTS (
      SELECT Max(F.Value)
      EXCEPT SELECT Max(T.Value)
   )
;

See a live demo of this query at SQL Fiddle.

In SQL 2012, this would be solved better with a LEAD or LAG analytical function. The CROSS JOIN and Row_Number in my query simulates this by duplicating each row and assigning these duplicated rows in pairs into their own groups (where each group has two rows representing adjacent audit history rows). Then by strategic use of aggregates we can deal with the grouped pairs to select and compare their values.

Also, I originally wrote the query with UNPIVOT, but alas, it does not preserve NULLs--a severe oversight by Microsoft, in my opinion. It would have been easy for developers to add a condition removing NULLs if desired, but the way it is UNPIVOT can't be used at all when desiring to preserve NULLs. Ironically, the resulting code is more compact, and 2 lines shorter, using CROSS APPLY to UNPIVOT--now the conversion and unpivoting happen in one step instead of 2.

My sample data is:

ChangeDate              OrderID DeliveryDate            Quantity SpecialNotes
----------------------- ------- ----------------------- -------- ----------------------------------------------------
2013-03-01 11:28:00.000 1       2013-04-01 00:00:00.000 25       NULL
2013-03-01 11:56:00.000 1       2013-04-01 00:00:00.000 30       NULL
2013-03-05 10:18:00.000 1       2013-04-02 00:00:00.000 30       Customer called to ask for delivery date adjustment.
2013-03-01 11:37:00.000 2       2013-03-05 00:00:00.000 17       NULL

The resultant rowset:

OrderID ColName      DateChanged             OldValue                NewValue
------- ------------ ----------------------- ----------------------- ---------------------------------------------------
1       DeliveryDate 2013-03-05 10:18:00.000 2013-04-01 00:00:00.000 2013-04-02 00:00:00.000
1       Quantity     2013-03-01 11:56:00.000 25                      30
1       SpecialNotes 2013-03-05 10:18:00.000 NULL                    Customer called to ask for delivery date adjustment.

Note: since my query has only a single ranking function and no JOINs, this will perform extremely well even in very large tables--orders of magnitude better, perhaps, than solutions that use a JOIN where there is no supporting index. It would be best for the audit table to have a clustered index on PK, ChangeDate.