Query help when using audit table

2020-06-25 16:27发布

问题:

Assuming I have two tables, one with the following columns called lease_period:

tenant_trading_name, suite_id, lease_id, building_id

and another, called lease_period_audit with the following:

audit_date, audit_type, tenant_trading_name, suite_id, lease_id, building_id

Each time a record is updated on lease_period and entry is made on lease_period_audit with a status of 'Updated'. I'm trying to find all updates made only to the tenant_trading_name field but haven't had any success. So far I have the following:

select              lpa.*
from                property.lease_period_audit lpa
inner join          property.lease_period lp on lpa.suite_id = lp.suite_id and lpa.lease_id = lp.lease_id and lpa.building_id = lp.building_id
where               audit_type = 'Updated'
                    and lp.tenant_trading_name <> lpa.tenant_trading_name
order by            1 desc  

Where's the flaw in my thought process here? How can this be done / how should I be thinking about this?

回答1:

Assuming the audit table also logs the lease_period primary key column, referenced here lp_id for simplicity, you could try the following approach:

  1. Find all the rows where audit_type is 'Updated'.

  2. Rank all the rows by audit_date and partitioning them by lp_id.

  3. Rank the rows by audit_date partitioning by lp_id, suite_id, lease_id, building_id.

  4. Get the difference between the two rankings.

  5. Rank the rows again by audit_date, partitioning them now by lp_id, suite_id, lease_id, building_id, (ranking_difference).

  6. Output all the rows where the last ranking value is 2 or greater.

The first four steps result in a row set where each group of consecutive (in ascending order of audit_date) rows with identical values of suite_id, lease_id, building_id for the same lp_id will be uniquely distinguished by a value calculated as the difference between the rankings #2 & #3.

Within the group, every row, starting from the second one, will differ from the previous one only in the value of tenant_trading_name, which is just what we need. So, we rank the rows once again, taking into account the 'group ID' we've just obtained, then return every row with the ranking of 2 or higher.

Here's an approximate implementation:

WITH marked AS (
  SELECT
    *,
    grp = ROW_NUMBER() OVER (PARTITION BY lp_id
                                 ORDER BY audit_date)
        - ROW_NUMBER() OVER (PARTITION BY lp_id, suite_id, lease_id, building_id
                                 ORDER BY audit_date)
  FROM lease_period_audit
  WHERE audit_type = 'Updated'
),
ranked AS (
  SELECT
    *,
    rnk = ROW_NUMBER() OVER (PARTITION BY lp_id, suite_id, lease_id, building_id, grp
                                 ORDER BY audit_date)
  FROM marked
)
SELECT
  audit_date,
  lp_id,
  tenant_trading_name,
  suite_id,
  lease_id,
  building_id
FROM ranked
WHERE rnk = 2

Note. This assumes that the audit table logs only real changes, i.e. there can't be two consecutive rows with the same primary key where all four columns have identical values.



回答2:

You should think something like this (pseudo-code):

edit: I haven't realized before that the lpa table actually has all the data, there's no need to join with LP

select lpa.*
from   lpa 
join   lpa_before on 
           lpa_before.id = lpa.id and 
           lpa_before.date = 
                  (select max(date) from lpa3 where lpa3.date < lpa.date and lpa.id = lpa3.id)
where  auditytype = 'update' and lpa.name <> lpa_before.name

I hope I could explain.. (it's not a trivial solution, but thats what comes to my mind)

In English:

Select the LPA registers. Join with another LPA that will be the LPA just before the original one. Call this LPA_BEFORE. To join the LPA_BEFORE you have to compare all ids from LPA and LPA_BEFORE, and the LPA_BEFORE date must be the maximum from the LPAs which have a date less than the original LPA. Compare the names of the LPA with LPA_BEFORE