Assuming I have two tables, one with the following columns called lease_period:
tenant_trading_name, suite_id, lease_id, building_id
and another, called lease_period_audit with the following:
audit_date, audit_type, tenant_trading_name, suite_id, lease_id, building_id
Each time a record is updated on lease_period and entry is made on lease_period_audit with a status of 'Updated'. I'm trying to find all updates made only to the tenant_trading_name field but haven't had any success. So far I have the following:
select lpa.*
from property.lease_period_audit lpa
inner join property.lease_period lp on lpa.suite_id = lp.suite_id and lpa.lease_id = lp.lease_id and lpa.building_id = lp.building_id
where audit_type = 'Updated'
and lp.tenant_trading_name <> lpa.tenant_trading_name
order by 1 desc
Where's the flaw in my thought process here? How can this be done / how should I be thinking about this?
Assuming the audit table also logs the lease_period
primary key column, referenced here lp_id
for simplicity, you could try the following approach:
Find all the rows where audit_type
is 'Updated'
.
Rank all the rows by audit_date
and partitioning them by lp_id
.
Rank the rows by audit_date
partitioning by lp_id, suite_id, lease_id, building_id
.
Get the difference between the two rankings.
Rank the rows again by audit_date
, partitioning them now by lp_id, suite_id, lease_id, building_id, (ranking_difference)
.
Output all the rows where the last ranking value is 2 or greater.
The first four steps result in a row set where each group of consecutive (in ascending order of audit_date
) rows with identical values of suite_id, lease_id, building_id
for the same lp_id
will be uniquely distinguished by a value calculated as the difference between the rankings #2 & #3.
Within the group, every row, starting from the second one, will differ from the previous one only in the value of tenant_trading_name
, which is just what we need. So, we rank the rows once again, taking into account the 'group ID' we've just obtained, then return every row with the ranking of 2 or higher.
Here's an approximate implementation:
WITH marked AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY lp_id
ORDER BY audit_date)
- ROW_NUMBER() OVER (PARTITION BY lp_id, suite_id, lease_id, building_id
ORDER BY audit_date)
FROM lease_period_audit
WHERE audit_type = 'Updated'
),
ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY lp_id, suite_id, lease_id, building_id, grp
ORDER BY audit_date)
FROM marked
)
SELECT
audit_date,
lp_id,
tenant_trading_name,
suite_id,
lease_id,
building_id
FROM ranked
WHERE rnk = 2
Note. This assumes that the audit table logs only real changes, i.e. there can't be two consecutive rows with the same primary key where all four columns have identical values.
You should think something like this (pseudo-code):
edit: I haven't realized before that the lpa table actually has all the data, there's no need to join with LP
select lpa.*
from lpa
join lpa_before on
lpa_before.id = lpa.id and
lpa_before.date =
(select max(date) from lpa3 where lpa3.date < lpa.date and lpa.id = lpa3.id)
where auditytype = 'update' and lpa.name <> lpa_before.name
I hope I could explain.. (it's not a trivial solution, but thats what comes to my mind)
In English:
Select the LPA registers.
Join with another LPA that will be the LPA just before the original one. Call this LPA_BEFORE.
To join the LPA_BEFORE you have to compare all ids from LPA and LPA_BEFORE, and the LPA_BEFORE date must be the maximum from the LPAs which have a date less than the original LPA.
Compare the names of the LPA with LPA_BEFORE