I have audit for records in a table. There multiple columns and each records states for a change of 1 or more columns.
I need to return an audit result where the return pattern will be: column
(id, alias or name), previous value, new value, etc.
The problem is in that there can be multiple columns with changed data per each new record. At the same time the quantity of auditable columns is 5 so it's possible to "hardcode" there names and changes verifications.
So is it possible to compose such a query in a shortened manner not just using UNIONS and make a SELECT query for each column and check for change?
Let's say there is the table with the columns:
id, datetime value, int value, varchar value.
And if I have 2 records with such a data change as like:
id1, value1, value1, value1
id1, value2, value1, value2
Then I expect such audit results:
id1, value1 as oldvalue, value2 as newvalue, column2name as columnname
id1, value1 as oldvalue, value2 as newvalue, column4name as columnname
If I haven't missed anything:
WITH ranked AS (
SELECT
ChangeDate,
ColPK,
Col1,
Col2,
Col3,
Col4,
Col5,
OverallRank = ROW_NUMBER() OVER (PARTITION BY ColPK ORDER BY ChangeDate),
Col1Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col1 ORDER BY ChangeDate),
Col2Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col2 ORDER BY ChangeDate),
Col3Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col3 ORDER BY ChangeDate),
Col4Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col4 ORDER BY ChangeDate),
Col5Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col5 ORDER BY ChangeDate)
FROM AuditTable
)
, ranked2 AS (
SELECT
ChangeDate,
ColPK,
Col1,
Col2,
Col3,
Col4,
Col5,
Col1Group = RANK() OVER (PARTITION BY ColPK, Col1 ORDER BY OverallRank - Col1Rank),
Col2Group = RANK() OVER (PARTITION BY ColPK, Col2 ORDER BY OverallRank - Col2Rank),
Col3Group = RANK() OVER (PARTITION BY ColPK, Col3 ORDER BY OverallRank - Col3Rank),
Col4Group = RANK() OVER (PARTITION BY ColPK, Col4 ORDER BY OverallRank - Col4Rank),
Col5Group = RANK() OVER (PARTITION BY ColPK, Col5 ORDER BY OverallRank - Col5Rank),
Col1Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col1, OverallRank - Col1Rank ORDER BY ChangeDate),
Col2Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col2, OverallRank - Col2Rank ORDER BY ChangeDate),
Col3Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col3, OverallRank - Col3Rank ORDER BY ChangeDate),
Col4Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col4, OverallRank - Col4Rank ORDER BY ChangeDate),
Col5Rank = ROW_NUMBER() OVER (PARTITION BY ColPK, Col5, OverallRank - Col5Rank ORDER BY ChangeDate)
FROM ranked
),
unpivoted AS (
SELECT
r.ChangeTime,
r.ColPK,
x.ColName,
ColRank = CASE x.Colname
WHEN 'Col1' THEN Col1Group
WHEN 'Col2' THEN Col2Group
WHEN 'Col3' THEN Col3Group
WHEN 'Col4' THEN Col4Group
WHEN 'Col5' THEN Col5Group
END,
Value = CASE x.Colname
WHEN 'Col1' THEN CONVERT(nvarchar(100), r.Col1)
WHEN 'Col2' THEN CONVERT(nvarchar(100), r.Col2)
WHEN 'Col3' THEN CONVERT(nvarchar(100), r.Col3)
WHEN 'Col4' THEN CONVERT(nvarchar(100), r.Col4)
WHEN 'Col5' THEN CONVERT(nvarchar(100), r.Col5)
END
FROM ranked2 r
INNER JOIN (VALUES ('Col1'), ('Col2'), ('Col3'), ('Col4'), ('Col5')) x (ColName)
ON x.ColName = 'Col1' AND Col1Rank = 1
OR x.ColName = 'Col2' AND Col2Rank = 1
OR x.ColName = 'Col3' AND Col3Rank = 1
OR x.ColName = 'Col4' AND Col4Rank = 1
OR x.ColName = 'Col5' AND Col5Rank = 1
)
SELECT
new.ChangeTime,
new.ColPK,
new.ColName,
old.Value AS OldValue,
new.Value AS NewValue
FROM unpivoted new
LEFT JOIN unpivoted old
ON new.ColPK = old.ColPK
AND new.ColName = old.ColName
AND new.ColRank = old.ColRank + 1
Basically, the idea is to rank contiguous groups of identical values and pick first occurrences of every value. That is done for every column whose values are being audited, and the columns are unpivoted in the process. Afterwards, the unpivoted row set is joined to itself, i.e. for every PK and column name, every row is matched to its predecessor (based on the ranking) to obtain the old value in the same row for the final result set.
Here is a simpler query that produces the same desired results, and is much easier to modify to accommodate a different number of columns or changing column names, since the only differences are the PK column(s) + a single line per non-PK-column in the CROSS APPLY
. I had to add a ChangeDate
column--without it, there is no way to know the order of the rows inserted to the audit table.
WITH ColValues AS (
SELECT
Grp = Row_Number() OVER (
PARTITION BY H.OrderID, U.ColName ORDER BY H.ChangeDate ASC, X.Which
) / 2,
H.OrderID,
H.ChangeDate,
U.*,
X.Which
FROM
dbo.OrderHistory H
CROSS APPLY (VALUES
('DeliveryDate', Convert(varchar(1000), DeliveryDate, 121)),
('Quantity', Convert(varchar(1000), Quantity)),
('SpecialNotes', Convert(varchar(1000), SpecialNotes))
) U (ColName, Value)
CROSS JOIN (VALUES (1), (2)) X (Which)
)
SELECT
V.OrderID,
V.ColName,
DateChanged = Max(V.ChangeDate),
OldValue = Max(F.Value),
NewValue = Max(T.Value)
FROM
ColValues V
OUTER APPLY (SELECT V.ColName, V.Value WHERE V.Which = 2) F
OUTER APPLY (SELECT V.ColName, V.Value WHERE V.Which = 1) T
GROUP BY
V.OrderID,
V.ColName,
V.Grp
HAVING
Count(*) = 2
AND EXISTS (
SELECT Max(F.Value)
EXCEPT SELECT Max(T.Value)
)
;
See a live demo of this query at SQL Fiddle.
In SQL 2012, this would be solved better with a LEAD
or LAG
analytical function. The CROSS JOIN
and Row_Number
in my query simulates this by duplicating each row and assigning these duplicated rows in pairs into their own groups (where each group has two rows representing adjacent audit history rows). Then by strategic use of aggregates we can deal with the grouped pairs to select and compare their values.
Also, I originally wrote the query with UNPIVOT
, but alas, it does not preserve NULLs--a severe oversight by Microsoft, in my opinion. It would have been easy for developers to add a condition removing NULLs if desired, but the way it is UNPIVOT
can't be used at all when desiring to preserve NULLs. Ironically, the resulting code is more compact, and 2 lines shorter, using CROSS APPLY
to UNPIVOT--now the conversion and unpivoting happen in one step instead of 2.
My sample data is:
ChangeDate OrderID DeliveryDate Quantity SpecialNotes
----------------------- ------- ----------------------- -------- ----------------------------------------------------
2013-03-01 11:28:00.000 1 2013-04-01 00:00:00.000 25 NULL
2013-03-01 11:56:00.000 1 2013-04-01 00:00:00.000 30 NULL
2013-03-05 10:18:00.000 1 2013-04-02 00:00:00.000 30 Customer called to ask for delivery date adjustment.
2013-03-01 11:37:00.000 2 2013-03-05 00:00:00.000 17 NULL
The resultant rowset:
OrderID ColName DateChanged OldValue NewValue
------- ------------ ----------------------- ----------------------- ---------------------------------------------------
1 DeliveryDate 2013-03-05 10:18:00.000 2013-04-01 00:00:00.000 2013-04-02 00:00:00.000
1 Quantity 2013-03-01 11:56:00.000 25 30
1 SpecialNotes 2013-03-05 10:18:00.000 NULL Customer called to ask for delivery date adjustment.
Note: since my query has only a single ranking function and no JOIN
s, this will perform extremely well even in very large tables--orders of magnitude better, perhaps, than solutions that use a JOIN
where there is no supporting index. It would be best for the audit table to have a clustered index on PK, ChangeDate
.