SQL Query for selecting entities from a history ch

2019-08-23 11:25发布

I have to create a query that I am not really sure how to handle since I can't get all the cases covered.

I have one table called company with the following (relative) columns

Table Company

Columns:

Id | Name | Status | Status_Effective_Date

Let's say that Status can take values from 1 to 12. The logic is that the company is in status 2 for example since 01/01/2018 (status_effective_date)

Table Company Status History

Columns

Id | Company_Id | Status | Status_Effective_Date

This table holds the history of the changes that have happened to the status. If for example I have two entries for the company with Id = 10 like

Row_1 : 1 | 10 | 1 | 02/03/2011
Row_2 : 2 | 10 | 5 | 06/08/2013

Then the company with Id = 10 was is status 1 from 02/03/2011 until 06/08/2013. After that it was is status 5.

What I want to do is to create a report that will show me all the companies that for a selected date range have been at some point in the selected status.

Now for example let's say I want to query with status = 1 and date range between 01/01/2017 - 31/12/2017

The cases that I have to cover that I can understand are: (YES the cases that I want and NO the cases that I don't want)

  1. One company was always in status 1 and never changed (YES)

    1.1 The company_table entry has status = 1 and effective date before the start date

    1.2 The company_status_history table doesn't have any rows since no change in the status has been applied

  2. One company was in status 1 and changed to different status before the date range (NO)

    2.1 The company_table entry has status <> 1 and effective date before the start date

    2.2 The company_status_history_table has one entry of the company in the status 1 with effective date the initial effective date (initial state) and one entry of the company in the new status (<>1) with effective date the date of the change (before the date range)

  3. One company was in status 1 and changed to different status inside the date range (YES)

    3.1 The company_table entry has status <> 1 and effective date after the start date and before the end date

    3.2 The company_status_history_table has one entry of the company in the status 1 with effective date the initial effective date (initial state) and one entry of the company in the new status (<>1) with effective date the date of the change (inside the date range)

  4. One company was in status 1 and changed to different status after the date range (YES)

    4.1 The company_table entry has status <> 1 and effective date after the after the end date

    4.2 The company_status_history_table has one entry of the company in the status 1 with effective date the initial effective date (initial state) and one entry of the company in the new status (<>1) with effective date the date of the change (after the date range)

  5. One company was in status <>1 and changed to status = 1 before the date range (YES)

    5.1 The company_table entry has status 1 or <> 1 (since it might have changed again) and the effective date could be before the date range if it is still in status = 1 or some later date if it has changed again.

    5.2 The company_status_history_table has one entry of the company in the previous status with effective date the initial effective date (initial state) and at least one entry of the company in the new status (=1) with effective date the date of the change (before the date range)

  6. One company was in status <>1 and changed to status 1 inside the date range (YES)

    6.1 The company_table entry has status 1 or <>1 (if it has changed again) and effective date the date inside the date_range or some later one if it has changed again

    6.2 The company_status_history_table has one entry of the company in the initial status with effective date the initial effective date (initial state) and at least one entry of the company in the status 1 with effective date the date of the change (inside the date range)

  7. One company was in status <>1 and changed to status 1 after the date range (NO)

    7.1 The company_table entry has status 1 or <>1 (if there has been another change) and effective date after the after the end date

    7.2 The company_status_history_table has one entry of the company in the initial status <>1 with effective date the initial effective date (initial state) and at least one entry of the company in the new status (1) with effective date the date of the change (after the date range)

What I have tried so far is the below:

-- Case 6
select *
from company com, company_status_history csh 
where csh.company_status_id = 1
    and com.company_id = csh.company_id 
    and csh.company_status_eff_date > '20170101'
    and csh.company_status_eff_date < '20171231'
union
-- Case 1
select *
from company com
where com.company_status_id = 1
    and com.company_status_eff_date < '20181231'
    and com.company_id NOT IN (select company_id 
                        from company_status_history csh)

I am guessing there might be a more effective way from using the union.

What I am missing is cases 3,4,5 and it is the part that I should understand from the previous company_status_history entry (effective date) of the following query if the change should include the company into my final list.

select * from company com, company_status_history csh 
where com.company_id = csh.company_id 

Every help would be greatly appreciated.

3条回答
我命由我不由天
2楼-- · 2019-08-23 11:56

I will take shot!

Seems like you want the list of any company that has been in the given status and date range provided. This seems a bit less complicated when we realize that you want to know ANY company that has had ANY status in BETWEEN the provided date range.

Case 3,4 and 5 are your most important: One company was in status 1 and changed to different status inside the date range (YES) We can find this out no problem. One company was in status 1 and changed to different status after the date range (YES) As long as it was in the status at the beginning of the date range, then it should appear in the results.

One company was in status <>1 and changed to status = 1 before the date range (YES) - so it arrived into the status in between the beginning and end of the date range.

Here's a working sample I made up. Hope it helps.

   --TEST DATA
        create table #company
        (
        id int identity not null,
        name varchar(222) not null,
        status int not null,
        status_effective_date datetime not null
        )
        create table #companystatus
        (
        id int identity not null,
        company_id int not null,
        status int not null,
        status_effective_date datetime not null
        )

        insert into #company (name,status,status_effective_date) values('foo',1,'12/31/2017')
        insert into #company (name,status,status_effective_date) values('biz',2,'11/30/2017')
        insert into #company (name,status,status_effective_date) values('baz',1,'12/31/2017')
        insert into #company (name,status,status_effective_date) values('bloh',3,'11/30/2017')
        insert into #company (name,status,status_effective_date) values('blee',4,'12/31/2017')

        declare @fooid int
        set @fooid = (select id from #company where name = 'foo')
        declare @bizid int
        set @bizid =  (select id from #company where name = 'biz')
        declare @bazid int
        set @bazid = (select id from #company where name = 'baz')
        declare @blohid int
        set @blohid =   (select id from #company where name = 'bloh')

        insert into #companystatus (company_id,status,status_effective_date) values (@fooid,2,'1/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@bizid,5,'3/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@bazid,4,'2/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@blohid,1,'2/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@fooid,2,'4/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@fooid,4,'9/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@bizid,2,'5/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@bazid,1,'10/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@blohid,3,'7/1/2018')
        insert into #companystatus (company_id,status,status_effective_date) values (@fooid,1,'6/1/2018')

            --get only one distinct company id and name where the status date is between my two dates and the status is the value I have chosen (1)
        select distinct(cn.id), cn.name from #companystatus c
        join #company cn on
        cn.id = c.company_id
        where c.status_effective_date between '1/1/2017' and '5/1/2018'
        and c.status in (1) 
--replace the dates and status with what you want
        --clean up
        drop table #company
        drop table #companystatus
查看更多
Juvenile、少年°
3楼-- · 2019-08-23 12:02

From what I read I assume that you want all the companies that had status 1 inside your date range. If this is what you want, that’s pretty easy.

The following statement should do the job :

SELECT C.*
  FROM COMPANY C
  LEFT JOIN ( SELECT H.STATUS, H.COMPANY_ID
                FROM COMPANYSTATUS H
               WHERE H.STATUS_EFFECTIVE_DATE = (SELECT MAX(H1.STATUS_EFFECTIVE_DATE)
                                                  FROM COMPANYSTATUS H1
                                                 WHERE H1.COMPANY_ID = H.COMPANY_ID
                                                   AND H1.STATUS_EFFECTIVE_DATE <= '20171231'
             ) CH ON CH.COMPANY_ID = C.ID   
 WHERE ( C.STATUS = 1 AND CH.COMPANY_ID IS NULL ) -- CASE #1 : STATUS = 1 AND NEVER CHANGED (NO HISTORY)
    OR H.STATUS = 1

What I did was to create a nested view with the last status until the end of your date range, so if the last change of a company status is 1 then this company should be included in your result. We don’t care about changes after your date range, so I put the restriction inside the nested view.

The beginning of your range is insignificant for this request. You probably needed for other purposes, to join other tables.

I am Oracle guy, so I think Ι could make this statement much better using Oracle analytics, but I think it will be a valid statement for SQL server.

查看更多
\"骚年 ilove
4楼-- · 2019-08-23 12:03

I finally got the working query thanks to this answer to my simplified problem. I am posting it in case someone else needs it.

-- View that gets the entity_history_status with start date and end date
-- At the moment if it is only one entry in the table (it went to current) the end_date is the same as the start date
;WITH changed_companies_with_ranges AS
(
    -- Get the final table for the selected status
    SELECT company_id,
           MIN(company_status_eff_date) AS start_date, 
           CASE 
              WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(company_status_eff_date))
              ELSE min(company_status_eff_date) -- Check why it changes the result when I set the today as end_date
           END AS end_date,
           grp

    FROM (
              -- Something with counting again in order to get the start and end date according to this grp
              SELECT company_status_hist_id, company_id, company_status_id, company_status_eff_date,
                     ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY company_status_eff_date) - 
                     cnt AS grp
              FROM (
                        -- Get the entries from the history table that had at any point the selected status and add a new column saying how many entries do they have
                        SELECT esh2.company_status_hist_id, esh2.company_id, esh2.ecompany_status_id, esh2.company_status_eff_date, x.cnt
                        FROM company_status_history AS esh2
                        OUTER APPLY 
                        (
                           SELECT COUNT(*) AS cnt
                           FROM ecompany_status_history AS c
                           WHERE c.company_status_id = 1 -- The selected status
                                 AND c.company_id  = esh2.company_id 
                                 AND c.company_status_eff_date < esh2.company_status_eff_date
                        ) AS x
                   ) as CTE
            ) as CTE2 
    GROUP BY company_id, grp
    HAVING COUNT(CASE WHEN company_status_id = 1 THEN 1 END) > 0 -- The selected status
)
SELECT * FROM (
    SELECT 
     en.company_id
    ,en.company_name
    ,en.company_reg_num
    FROM company en
    where en.company_id in(
                            select company_id 
                            from changed_entitities_with_ranges 
                            where start_date = end_date
                        )
                        and en.company_status_eff_date > '2017-01-01 00:00:00.000' -- Start Date
    union
    select 
     en.company_id
    ,en.company_name
    ,en.company_reg_num
    FROM company en
    where en.company_id in(
                            select company_id 
                            from changed_entitities_with_ranges 
                            where (start_date between '2017-01-01 00:00:00.000' and '2017-12-31 00:00:00.000') -- Range
                               or (start_date < '2017-01-01 00:00:00.000' and end_date > '2017-01-01 00:00:00.000') -- Start date -- End date
                         )
    union
    -- 1. Without any history changes + 3. changed to 1 before start date + 5 Changes to the normal status from other statuses before the end date + 7 Changes to the normal status from other statuses in between the period
    -- Gets the entities that haven't changed at all and have been in status 1 before the end date
    SELECT 
     en.company_id
    ,en.company_name
    ,en.company_reg_num
    FROM company en
    WHERE en.company_status_id = 1
                       AND en.company_status_eff_date < '2017-12-31 00:00:00.000'
    UNION
    -- 2. Changes to the other statuses from the status of normal after the start date + 4. Changes to the other statuses from the status of normal before the end date  + 6. Changes to the other statuses from the status of normal in between the period
    -- Gets the entities that have been changed to any status but were created or altered ato some point inside the range
    SELECT 
     en.company_id
    ,en.company_name
    ,en.company_reg_num
    from company en
    where en.company_id IN (select company_id from company_status_history es 
                       where es.company_status_eff_date BETWEEN '2017-01-01 00:00:00.000' AND '2017-12-31 00:00:00.000' AND es.company_status_id = 1)
) as result ORDER BY company_id
查看更多
登录 后发表回答